Analysis of Time Series Data Research Paper Example | Topics and Well Written Essays

of Data: Where did it come from' (How was it originally recorded' By whom' How frequently' Under what conditions Where has it been' (What other systems has it passed through' How has it been adjusted, aggregated, averaged, or otherwise massaged') Is it clean or dirty' (Are there data entry errors' Missing data' Misalignment of time periods' Changes in reporting practices' Bizarre events') And last but not least... In what units is it measured' (Has it been seasonally adjusted , and if so, how' Is it measured in monthly totals or an annual rate' In nominal or constant (inflation-adjusted) units of currency' Does it represent the current level of something, or does it represent the absolute change from one period to another, or the percentage change from one period to another' Are the units consistent from one variable to another') Variables in your data set should be clearly annotated to indicate their sources, units of measurement, and any problems or peculiarities you are aware of. Assembling, cleaning, adjusting, and documenting the units of the data is often the most tedious step of forecasting, and failure to attend to these mundane details may lead to egregious errors of modeling. The good news is that you often learn a good deal in the process, gaining insight into the trends and forces which are influencing the variables you wish to predict. Draw the #!*$ picture: Graph data to get a feel for its qualititative properties. For example, suppose you are analyzing retail sales in the US auto industry. Note that data are in billions of dollars, not seasonally adjusted, or "nsa." What qualitative features are evident on this graph' You might notice some of the following: A strong general upward trend A pronounced seasonal pattern Increasing amplitude of the seasonal variations over time Some evidence of business cycles (downturns in early 1980's and 1990's) Statistical stationarity: Statistical forecasting methods depend on the fact that a time series could be rendered stationary. A stationary time series is a time series whose statistical properties such as mean, variance, autocorrelation, etc. remain constant over time. Statistical forecasting methods compute these stationary time statistical properties from its past values, and use them to predict future values since they will remain the same in the future. Obtaining statistical values such as means, variances, and correlation from non-stationary time series are non-meaningful. This is because such statistics represent only the past but not the future. For example, if the series is consistently increasing over time, the sample mean and variance will grow with the size of the sample, and they will always underestimate the mean and variance in future periods. For this reason much caution should be given to extrapolate regression models fitted to non-stationary data. Non-Stationary Time Series: However, most naturally created time series are non-stationary when expressed in their original units of measurements. They exhibit trends, cycles, random-walking and non-stationary behavior. They remain non-stationary even after deflation or seasonal adjustment. Transforming Non-Stationary Time Series: Non-stationary time series could be converted into stationary ones using mathematical transformations. Predictions for the stationarized series can then be "untransformed," by reversing whatever mathematical transformations were previously used, to obtain predictions for the original series. Thus, finding the sequence of transformations needed to stationarize a time series often provides important clues in the search for an appropriate forecasting model. Trend-Stationary Time Series: It is a time series with a stable long-run trend and reverts back to the trend line following a disturbance. It is stationarized by de-trending. De-treding involves fitting a trend line then subtracting it from the time series. Another way would include the time index as an independent variable in a regression or ARIMA model. Difference-stationary Time Series: De-trending a difference-stationary series does not render it stationary as the mean, variance and autocorrelation remain not constant in time. It needs to be transformed into a series of period-to-period and/or season-to-season differences. Inflation adjustment or Deflation Deflation is important for analyzing economic data. Inflation is a component of growth in any series measured in dollars. Deflation finds real growth.' Inflation adjustment stabilizes the variance of random or seasonal fluctuations and/or highlight cyclical patterns in the data. Inflation-adjustment is not always necessary when dealing with monetary variables and could be replaced with logarithm transformation to stability variance. Sometimes it is simpler to forecast data in nominal terms and not use inflation-adjustment. Non-Monetary Series Inflation adjustment is only appropriate for series measured in units of money: if the series is measured in number of widgets produced or hamburgers served or percent interest, it makes no sense to deflate. Divide a monetary time series by a price index, such as Consumer Price Index (CPI). The original series is measured in "nominal dollars" or "current dollars." The deflated series is measured in "constant dollars," The Consumer Price Index is probably the best known US price index, but other price indices may be appropriate for some data. The Producer Price Index and the GDP Implicit Price Deflator are some other commonly used indices, and numerous industry-specific indices are also available.' The U.S. Bureau of Economic Analysis compiles a wide array if "chain-type" price indices for various kinds of personal consumption goods.' A chain-type index is one that is obtained by chaining together monthly, quarterly, or annual changes in relative prices that are adjusted for changes in the composition of the commodity basket, so as to reflect changes in consumer tastes.' Use of the "correct" price index is important if you are interested in knowing the exact magnitudes of trends in real terms and/or if the relevant price history has undergone sudden jumps or significant changes in trend rather than consistent increases over time.'' However, deflation by a general-purpose index such as the CPI is often adequate for rough estimates of trends in real terms when doing exploratory data analysis or when fitting a forecasting model that adapts to changing trends anyway.' Keep in mind that when you deflate a sales or consumer expenditures series by a general index such as the CPI, you are not necessarily converting from dollars spent to units sold or consumed, rather, you are converting from dollars spent on one type of good to equivalent quantities of other consumer goods (e.g., hamburgers and hot dogs) that could have been purchased with the same money.' Sometimes this is of interest in its own right because it reveals growth in relative terms (i.e., relative to the other goods). Here is a graph of the auto sales in nominal dollars plotted alongside the CPI over the last 25 years, where the CPI has been scaled so that the January 1990 value is 1.0. Now here is a graph of auto sales divided by (i.e. deflated by) the CPI. Note that much (though not all) of the upward trend has been removed, accentuating the seasonal and cyclical components of the data. The recessionary periods in the mid-1970's, early 1980's, and early 1990's are especially evident: For modeling purposes, the choice of a reference point doesn't matter, since changing the reference point merely multiplies or divides the whole series by a constant. To move the reference point to a different base year, you would just divide the whole price index series by the current value of the index at the desired reference date. However, the parameters of a model are easier to interpret if the same reference point is used for all inflation adjustments. The thing you wish to avoid at all costs is having some variables which are inflation adjusted and others which aren't: this will introduce apparent nonlinear relationships which are merely artifacts of inconsistent units. Example: Adjust auto sales series to 1970 dollars instead of 1990 dollars. The CPI series is divided through by its original value in January 1970, to obtain a new consumer price index series called CPI70 in which the 1970 value is equal to 1.0. Then the auto sales series is divided by the CPI70 index: Graph of the auto sales series in 1970 dollars would look identical to the graph in 1990 dollars: only the axis scale numbers would change. Seasonal adjustment Multiplicative adjustment: Seasonal variations in a time series are expressed in a percentage term as a multiplicative seasonal pattern. The magnitude of seasonal variations increases as the series grows over time. This multiplicative seasonal pattern is removed by multiplicative seasonal adjustment which divides each value of the time series by a seasonal index (around 1.0) that represents the percentage of normal typically observed in that season. For example, if December's sales are typically 130% of the normal monthly value (based on historical data), then each December's sales would be seasonally adjusted by dividing by 1.3. Similarly, if January's sales are typically only 90% of normal, then each January's sales would be seasonally adjusted by dividing by 0.9. Thus, December's value would be adjusted downward while January's would be adjusted upward, correcting for the anticipated seasonal effect. Seasonal indices could remain the same or vary over the years. Here are the multiplicative seasonal indices for AUTOSALE as computed by the Seasonal Decomposition procedure in Statgraphics: Now here is the seasonally adjusted version of AUTOSALE that is obtained by dividing each month's sales value by its estimated seasonal index: The seasonal pattern is gone, and what remains are the trend and cyclical components of the data, plus random noise. Additive adjustment: Used when seasonal variations are constant in magnitude, independent of the current average level of the series. A quantity is added or subtracted from each value of a time series. This quantity is the absolute amount by which the value in that season of the year tends to be below or above normal, as estimated from past data. Acronyms: When examining the descriptions of time series in Datadisk and other sources, the acronym SA stands for "seasonally adjusted, whereas NSA stands for "not seasonally adjusted. A seasonally adjusted annual rate (SAAR) is a time series in which each period's value has been adjusted for seasonality and then multiplied by the number of periods in a year, as though the same value had been obtained in every period for a whole year. The logarithm transformation Linearization property: Variables which are multiplicatively related and/or growing exponentially over time could be represented with linear models by applying the log function on these variables. The logarithm of a product equals the sum of the logarithms. Logging converts multiplicative relationships and exponentially growing variables to additive relationships. LOG (X*Y) = LOG(X) + LOG(Y) The log transformation converts the exponential growth pattern to a linear growth pattern. It also converts the multiplicative (proportional-variance) seasonal pattern to an additive (constant-variance) seasonal pattern. The following two graphs compare the orginal data to logged data: Requirements of logging and choice of base: The logarithm transformation can be applied only to data which are strictly positive--you can't take the log of zero or a negative number! Also, there are two kinds of logarithms in standard use: "natural" logarithms and base-10 logarithms. The only difference between the two is a scaling constant, which is not really important for modeling purposes. LOG is natural log EXP is (natural logarithm base, 2.718..., raised to the Yth power.) (inverse of LOG) Base-10 logarithm is LOG10 and EXP10 In Excel: Natural logarithm is LN LOG is base-10 logarithm. First difference of LOG = percentage change: Logging converts absolute differences into relative (i.e., percentage) differences. The series DIFF(LOG(Y)) represents the percentage change in Y from period to period. The percentage change in Y at period t is (Y(t)-Y(t-1))/Y(t-1), which is approximately equal to LOG(Y(t)) - LOG(Y(t-1)) when the percentage change is small. DIFF(Y)/LAG(Y,1) = DIFF(LOG(Y)). Plot of the percent change in auto sales versus the first difference of its logarithm. Blue and red lines are virtually indistinguishable except at the highest and lowest points. The poor man's deflator: Logging a series often has an effect very similar to deflating: it dampens exponential growth patterns and reduces heteroscedasticity (i.e., stabilizes variance). Logging is therefore a "poor man's deflator" which does not require any external data (or any head-scratching about which price index to use). Logging is not exactly the same as deflating--it does not eliminate an upward trend in the data--but it can straighten the trend out so that it can be better fitted by a linear model. If you're going to log the data and then fit a model that implicitly or explicitly uses differencing (e.g., a random walk, exponential smoothing, or ARIMA model), then it is usually redundant to deflate by a price index, as long as the rate of inflation changes only slowly: the percentage change measured in nominal dollars will be nearly the same as the percentange change in constant dollars. Mathematically speaking, DIFF(LOG(Y/CPI)) is nearly identical DIFF(LOG(Y)): the only difference between the two is a very faint amount of noise due to fluctuations in the inflation rate. To demonstrate this point, here's a graph of the first difference of logged auto sales, with and without deflation: By logging rather than deflating, you avoid the need to incorporate an explicit forecast of future inflation into the model: you merely lump inflation together with any other sources of steady compound growth in the original data. Logging the data before fitting a random walk model yields a so-called geometric random walk--i.e., a random walk with geometric rather than linear growth. A geometric random walk is the default forecasting model that is commonly used for stock price data. (Return to top of page.) Trend in logged units = percentage growth:'' Because changes in the natural logarithm are (almost) equal to percentage changes in the original series, it follows that the slope of a trend line fitted to logged data is equal to the average percentage growth in the original series.' For example, in the graph of shown above, if you "eyeball" a trend line you will see that the magnitude of logged auto sales increases by about 2.5 (from 1.5 to 4.0) over 25 years, which is an average increase of about 0.1 per year, i.e., 10% per year.'' It is much easier to estimate this trend from the logged graph than from the original unlogged one!' The 10% figure obtained here is nominal growth, including inflation.' If we had instead eyeballed a trend line on a plot of logged deflated sales, i.e., LOG(AUTOSALE/CPI), its slope would be the average real percentage growth. Usually the trend is estimated more precisely by fitting a statistical model that explicitly includes a local or global trend parameter, such as a linear trend or random-walk-with-drift or linear exponential smoothing model.' When a model of this kind is fitted in conjunction with a log transformation, its trend parameter can be interpreted as a percentage growth rate. Errors in logged units = percentage errors: Another interesting property of the logarithm is that errors in predicting the logged series can be interpreted as percentage errors in predicting the original series, albeit the percentages are relative to the forecast values, not the actual values. (Normally one interprets the "percentage error" to be the error expressed as a percentage of the actual value, not the forecast value, athough the statistical properties of percentage errors are usually very similar regardless of whether the percentages are calculated relative to actual values or forecasts.) Thus, if you use least-squares estimation to fit a linear forecasting model to logged data, you are implicitly minimizing mean squared percentage error, rather than mean squared error in the original units--which is probably a good thing if the log transformation was appropriate in the first place. And if you look at the error statistics in logged units, you can interpret them as percentages. For example, the standard deviation of the errors in predicting a logged series is essentially the standard deviation of the percentage errors in predicting the original series, and the mean absolute error (MAE) in predicting a logged series is essentially the mean absolute percentage error (MAPE) in predicting the original series. Statgraphics tip: In the Forecasting procedure in Statgraphics, the error statistics shown on the Model Comparison report are all in untransformed (i.e., original) units to facilitate a comparison among models, regardless of whether they have used different transformations.' (This is a very useful feature of the Forecasting procedure--in most stat software it is hard to get a head-to-head comparison of models with and without a log transformation.)' However, whenever a regression model or an ARIMA model is fitted in conjunction with a log transformation, the standard-error-of-the-estimate or white-noise-standard-deviation statistics on the Analysis Summary report refer to the transformed (logged) errors, in which case they are essentially the RMS percentage errors. Read More

Analysis of Time Series Data - Research Paper Example

Extract of sample "Analysis of Time Series Data"

CHECK THESE SAMPLES OF Analysis of Time Series Data

Applied statistics for economics

Exposure and Health Effects of Air Pollution

Research Question and Literature Review

Dornbusch Overshooting Hypothesis

Functional Magnetic Resonance Imaging

Exposure and Health Effects of Air Pollution

Business sector: Healthcare

Sociological Analysis of Obesity