StudentShare
Contact Us
Sign In / Sign Up for FREE
Search
Go to advanced search...
Free

Analysis of Time Series Data - Research Paper Example

Cite this document
Summary
The author of the paper "Analysis of Time Series Data" argues in a well-organized manner that variables in the data set should be clearly annotated to indicate their sources, units of measurement, and any problems or peculiarities you are aware of…
Download full paper File format: .doc, available for editing
GRAB THE BEST PAPER92.2% of users find it useful
Analysis of Time Series Data
Read Text Preview

Extract of sample "Analysis of Time Series Data"

of Data: Where did it come from' (How was it originally recorded' By whom' How frequently' Under what conditions Where has it been' (What other systems has it passed through' How has it been adjusted, aggregated, averaged, or otherwise massaged') Is it clean or dirty' (Are there data entry errors' Missing data' Misalignment of time periods' Changes in reporting practices' Bizarre events') And last but not least... In what units is it measured' (Has it been seasonally adjusted , and if so, how' Is it measured in monthly totals or an annual rate' In nominal or constant (inflation-adjusted) units of currency' Does it represent the current level of something, or does it represent the absolute change from one period to another, or the percentage change from one period to another' Are the units consistent from one variable to another') Variables in your data set should be clearly annotated to indicate their sources, units of measurement, and any problems or peculiarities you are aware of. Assembling, cleaning, adjusting, and documenting the units of the data is often the most tedious step of forecasting, and failure to attend to these mundane details may lead to egregious errors of modeling. The good news is that you often learn a good deal in the process, gaining insight into the trends and forces which are influencing the variables you wish to predict. Draw the #!*$ picture: Graph data to get a feel for its qualititative properties. For example, suppose you are analyzing retail sales in the US auto industry. Note that data are in billions of dollars, not seasonally adjusted, or "nsa." What qualitative features are evident on this graph' You might notice some of the following: A strong general upward trend A pronounced seasonal pattern Increasing amplitude of the seasonal variations over time Some evidence of business cycles (downturns in early 1980's and 1990's) Statistical stationarity: Statistical forecasting methods depend on the fact that a time series could be rendered stationary. A stationary time series is a time series whose statistical properties such as mean, variance, autocorrelation, etc. remain constant over time. Statistical forecasting methods compute these stationary time statistical properties from its past values, and use them to predict future values since they will remain the same in the future. Obtaining statistical values such as means, variances, and correlation from non-stationary time series are non-meaningful. This is because such statistics represent only the past but not the future. For example, if the series is consistently increasing over time, the sample mean and variance will grow with the size of the sample, and they will always underestimate the mean and variance in future periods. For this reason much caution should be given to extrapolate regression models fitted to non-stationary data. Non-Stationary Time Series: However, most naturally created time series are non-stationary when expressed in their original units of measurements. They exhibit trends, cycles, random-walking and non-stationary behavior. They remain non-stationary even after deflation or seasonal adjustment. Transforming Non-Stationary Time Series: Non-stationary time series could be converted into stationary ones using mathematical transformations. Predictions for the stationarized series can then be "untransformed," by reversing whatever mathematical transformations were previously used, to obtain predictions for the original series. Thus, finding the sequence of transformations needed to stationarize a time series often provides important clues in the search for an appropriate forecasting model. Trend-Stationary Time Series: It is a time series with a stable long-run trend and reverts back to the trend line following a disturbance. It is stationarized by de-trending. De-treding involves fitting a trend line then subtracting it from the time series. Another way would include the time index as an independent variable in a regression or ARIMA model. Difference-stationary Time Series: De-trending a difference-stationary series does not render it stationary as the mean, variance and autocorrelation remain not constant in time. It needs to be transformed into a series of period-to-period and/or season-to-season differences. Inflation adjustment or Deflation Deflation is important for analyzing economic data. Inflation is a component of growth in any series measured in dollars. Deflation finds real growth.' Inflation adjustment stabilizes the variance of random or seasonal fluctuations and/or highlight cyclical patterns in the data. Inflation-adjustment is not always necessary when dealing with monetary variables and could be replaced with logarithm transformation to stability variance. Sometimes it is simpler to forecast data in nominal terms and not use inflation-adjustment. Non-Monetary Series Inflation adjustment is only appropriate for series measured in units of money: if the series is measured in number of widgets produced or hamburgers served or percent interest, it makes no sense to deflate. Divide a monetary time series by a price index, such as Consumer Price Index (CPI). The original series is measured in "nominal dollars" or "current dollars." The deflated series is measured in "constant dollars," The Consumer Price Index is probably the best known US price index, but other price indices may be appropriate for some data. The Producer Price Index and the GDP Implicit Price Deflator are some other commonly used indices, and numerous industry-specific indices are also available.' The U.S. Bureau of Economic Analysis compiles a wide array if "chain-type" price indices for various kinds of personal consumption goods.' A chain-type index is one that is obtained by chaining together monthly, quarterly, or annual changes in relative prices that are adjusted for changes in the composition of the commodity basket, so as to reflect changes in consumer tastes.' Use of the "correct" price index is important if you are interested in knowing the exact magnitudes of trends in real terms and/or if the relevant price history has undergone sudden jumps or significant changes in trend rather than consistent increases over time.'' However, deflation by a general-purpose index such as the CPI is often adequate for rough estimates of trends in real terms when doing exploratory data analysis or when fitting a forecasting model that adapts to changing trends anyway.' Keep in mind that when you deflate a sales or consumer expenditures series by a general index such as the CPI, you are not necessarily converting from dollars spent to units sold or consumed, rather, you are converting from dollars spent on one type of good to equivalent quantities of other consumer goods (e.g., hamburgers and hot dogs) that could have been purchased with the same money.' Sometimes this is of interest in its own right because it reveals growth in relative terms (i.e., relative to the other goods). Here is a graph of the auto sales in nominal dollars plotted alongside the CPI over the last 25 years, where the CPI has been scaled so that the January 1990 value is 1.0. Now here is a graph of auto sales divided by (i.e. deflated by) the CPI. Note that much (though not all) of the upward trend has been removed, accentuating the seasonal and cyclical components of the data. The recessionary periods in the mid-1970's, early 1980's, and early 1990's are especially evident: For modeling purposes, the choice of a reference point doesn't matter, since changing the reference point merely multiplies or divides the whole series by a constant. To move the reference point to a different base year, you would just divide the whole price index series by the current value of the index at the desired reference date. However, the parameters of a model are easier to interpret if the same reference point is used for all inflation adjustments. The thing you wish to avoid at all costs is having some variables which are inflation adjusted and others which aren't: this will introduce apparent nonlinear relationships which are merely artifacts of inconsistent units. Example: Adjust auto sales series to 1970 dollars instead of 1990 dollars. The CPI series is divided through by its original value in January 1970, to obtain a new consumer price index series called CPI70 in which the 1970 value is equal to 1.0. Then the auto sales series is divided by the CPI70 index: Graph of the auto sales series in 1970 dollars would look identical to the graph in 1990 dollars: only the axis scale numbers would change. Seasonal adjustment Multiplicative adjustment: Seasonal variations in a time series are expressed in a percentage term as a multiplicative seasonal pattern. The magnitude of seasonal variations increases as the series grows over time. This multiplicative seasonal pattern is removed by multiplicative seasonal adjustment which divides each value of the time series by a seasonal index (around 1.0) that represents the percentage of normal typically observed in that season. For example, if December's sales are typically 130% of the normal monthly value (based on historical data), then each December's sales would be seasonally adjusted by dividing by 1.3. Similarly, if January's sales are typically only 90% of normal, then each January's sales would be seasonally adjusted by dividing by 0.9. Thus, December's value would be adjusted downward while January's would be adjusted upward, correcting for the anticipated seasonal effect. Seasonal indices could remain the same or vary over the years. Here are the multiplicative seasonal indices for AUTOSALE as computed by the Seasonal Decomposition procedure in Statgraphics: Now here is the seasonally adjusted version of AUTOSALE that is obtained by dividing each month's sales value by its estimated seasonal index: The seasonal pattern is gone, and what remains are the trend and cyclical components of the data, plus random noise. Additive adjustment: Used when seasonal variations are constant in magnitude, independent of the current average level of the series. A quantity is added or subtracted from each value of a time series. This quantity is the absolute amount by which the value in that season of the year tends to be below or above normal, as estimated from past data. Acronyms: When examining the descriptions of time series in Datadisk and other sources, the acronym SA stands for "seasonally adjusted, whereas NSA stands for "not seasonally adjusted. A seasonally adjusted annual rate (SAAR) is a time series in which each period's value has been adjusted for seasonality and then multiplied by the number of periods in a year, as though the same value had been obtained in every period for a whole year. The logarithm transformation Linearization property: Variables which are multiplicatively related and/or growing exponentially over time could be represented with linear models by applying the log function on these variables. The logarithm of a product equals the sum of the logarithms. Logging converts multiplicative relationships and exponentially growing variables to additive relationships. LOG (X*Y) = LOG(X) + LOG(Y) The log transformation converts the exponential growth pattern to a linear growth pattern. It also converts the multiplicative (proportional-variance) seasonal pattern to an additive (constant-variance) seasonal pattern. The following two graphs compare the orginal data to logged data: Requirements of logging and choice of base: The logarithm transformation can be applied only to data which are strictly positive--you can't take the log of zero or a negative number! Also, there are two kinds of logarithms in standard use: "natural" logarithms and base-10 logarithms. The only difference between the two is a scaling constant, which is not really important for modeling purposes. LOG is natural log EXP is (natural logarithm base, 2.718..., raised to the Yth power.) (inverse of LOG) Base-10 logarithm is LOG10 and EXP10 In Excel: Natural logarithm is LN LOG is base-10 logarithm. First difference of LOG = percentage change: Logging converts absolute differences into relative (i.e., percentage) differences. The series DIFF(LOG(Y)) represents the percentage change in Y from period to period. The percentage change in Y at period t is (Y(t)-Y(t-1))/Y(t-1), which is approximately equal to LOG(Y(t)) - LOG(Y(t-1)) when the percentage change is small. DIFF(Y)/LAG(Y,1) = DIFF(LOG(Y)). Plot of the percent change in auto sales versus the first difference of its logarithm. Blue and red lines are virtually indistinguishable except at the highest and lowest points. The poor man's deflator: Logging a series often has an effect very similar to deflating: it dampens exponential growth patterns and reduces heteroscedasticity (i.e., stabilizes variance). Logging is therefore a "poor man's deflator" which does not require any external data (or any head-scratching about which price index to use). Logging is not exactly the same as deflating--it does not eliminate an upward trend in the data--but it can straighten the trend out so that it can be better fitted by a linear model. If you're going to log the data and then fit a model that implicitly or explicitly uses differencing (e.g., a random walk, exponential smoothing, or ARIMA model), then it is usually redundant to deflate by a price index, as long as the rate of inflation changes only slowly: the percentage change measured in nominal dollars will be nearly the same as the percentange change in constant dollars. Mathematically speaking, DIFF(LOG(Y/CPI)) is nearly identical DIFF(LOG(Y)): the only difference between the two is a very faint amount of noise due to fluctuations in the inflation rate. To demonstrate this point, here's a graph of the first difference of logged auto sales, with and without deflation: By logging rather than deflating, you avoid the need to incorporate an explicit forecast of future inflation into the model: you merely lump inflation together with any other sources of steady compound growth in the original data. Logging the data before fitting a random walk model yields a so-called geometric random walk--i.e., a random walk with geometric rather than linear growth. A geometric random walk is the default forecasting model that is commonly used for stock price data. (Return to top of page.) Trend in logged units = percentage growth:'' Because changes in the natural logarithm are (almost) equal to percentage changes in the original series, it follows that the slope of a trend line fitted to logged data is equal to the average percentage growth in the original series.' For example, in the graph of shown above, if you "eyeball" a trend line you will see that the magnitude of logged auto sales increases by about 2.5 (from 1.5 to 4.0) over 25 years, which is an average increase of about 0.1 per year, i.e., 10% per year.'' It is much easier to estimate this trend from the logged graph than from the original unlogged one!' The 10% figure obtained here is nominal growth, including inflation.' If we had instead eyeballed a trend line on a plot of logged deflated sales, i.e., LOG(AUTOSALE/CPI), its slope would be the average real percentage growth. Usually the trend is estimated more precisely by fitting a statistical model that explicitly includes a local or global trend parameter, such as a linear trend or random-walk-with-drift or linear exponential smoothing model.' When a model of this kind is fitted in conjunction with a log transformation, its trend parameter can be interpreted as a percentage growth rate. Errors in logged units = percentage errors: Another interesting property of the logarithm is that errors in predicting the logged series can be interpreted as percentage errors in predicting the original series, albeit the percentages are relative to the forecast values, not the actual values. (Normally one interprets the "percentage error" to be the error expressed as a percentage of the actual value, not the forecast value, athough the statistical properties of percentage errors are usually very similar regardless of whether the percentages are calculated relative to actual values or forecasts.) Thus, if you use least-squares estimation to fit a linear forecasting model to logged data, you are implicitly minimizing mean squared percentage error, rather than mean squared error in the original units--which is probably a good thing if the log transformation was appropriate in the first place. And if you look at the error statistics in logged units, you can interpret them as percentages. For example, the standard deviation of the errors in predicting a logged series is essentially the standard deviation of the percentage errors in predicting the original series, and the mean absolute error (MAE) in predicting a logged series is essentially the mean absolute percentage error (MAPE) in predicting the original series. Statgraphics tip: In the Forecasting procedure in Statgraphics, the error statistics shown on the Model Comparison report are all in untransformed (i.e., original) units to facilitate a comparison among models, regardless of whether they have used different transformations.' (This is a very useful feature of the Forecasting procedure--in most stat software it is hard to get a head-to-head comparison of models with and without a log transformation.)' However, whenever a regression model or an ARIMA model is fitted in conjunction with a log transformation, the standard-error-of-the-estimate or white-noise-standard-deviation statistics on the Analysis Summary report refer to the transformed (logged) errors, in which case they are essentially the RMS percentage errors. Read More
Cite this document
  • APA
  • MLA
  • CHICAGO
(“Analysis of time series data Research Paper Example | Topics and Well Written Essays - 3000 words”, n.d.)
Analysis of time series data Research Paper Example | Topics and Well Written Essays - 3000 words. Retrieved from https://studentshare.org/business/1510400-analysis-of-time-series-data
(Analysis of Time Series Data Research Paper Example | Topics and Well Written Essays - 3000 Words)
Analysis of Time Series Data Research Paper Example | Topics and Well Written Essays - 3000 Words. https://studentshare.org/business/1510400-analysis-of-time-series-data.
“Analysis of Time Series Data Research Paper Example | Topics and Well Written Essays - 3000 Words”, n.d. https://studentshare.org/business/1510400-analysis-of-time-series-data.
  • Cited: 2 times

CHECK THESE SAMPLES OF Analysis of Time Series Data

Applied statistics for economics

Analysis of Time Series Data and forecasting has been used in many fields and most commonly in the stock market prediction using the past data.... Time series forecasting takes the analysis from the time series data and tries to predict what the data may be in the near future, based on what it has been in the past.... time series analysis is a form of statistical data analysis on a series of sequential data points that are usually measured at uniform time intervals over a period of time....
10 Pages (2500 words) Essay

Exposure and Health Effects of Air Pollution

Most researchers in environmental and health areas adopted GIS (Geographical Information System) as an advanced exploratory data analysis tool (Aguilera et al.... On the other hand, GI Scientists have called attention to an interdisciplinary advancement from scholars of both physical and social sciences to advance the theories of analysis in spatial statistics, geostatistics, spatial econometrics, time-space modelling, geo-computation algorithms, and visualization....
7 Pages (1750 words) Research Paper

Research Question and Literature Review

The Analysis of Time Series Data in criminology forms the central theme of the study, with a view to establishing a plausible link between unemployment and crime, based on the time aspect.... A critique of the national-level time series data as an analytical tool for establishing the relationship between unemployment and crime is made.... To clearly unearth this link, modern state-of-the-art techniques of time series are applied.... The time series properties of this model are revisited, with a focus on establishing the credibility of the conceptual model....
4 Pages (1000 words) Assignment

Dornbusch Overshooting Hypothesis

First, the researcher introduces the data set that we use and then analyze the exchange rate movements in these countries to examine the existence of any systematic regularity that derives the exchange rate overshooting.... This selection process reduces the available data set to 24 episodes, which consist of 10 cases in the 1992-3 European crisis, 4 cases in the 1994-5 Mexican crisis, and 10 cases in the 1997-8 Asian crisis and related others.... mpirical analysis: Exchange rate shooting during Financial Crises of 1990sThis part of the papers documents main characteristics of the exchange rate movement in the countries that experienced currency crises in the 1990s....
13 Pages (3250 words) Essay

Functional Magnetic Resonance Imaging

However, brain activity changes can be relative changes between pairs of tasks, gradual or even nonlinear changes across a series of tasks, or correlations between different tasks.... For instance, the fMRI signal reflects changes in oxygen content with high but insufficient spatial-time resolution.... he basic approach is to have subjects engage in a target behavioural task for some time and then contrast that task period with periods where subjects perform a reference task....
12 Pages (3000 words) Essay

Exposure and Health Effects of Air Pollution

ir pollution is a complex mixture of particles and gases that can vary in composition depending on geographic location, season, and time of day.... "Exposure and Health Effects of Air Pollution" paper states that in order to fully understand the health effect caused by air pollution, a regional ecological study focusing on the spatial association among air quality levels, health effects, and SES is needed....
6 Pages (1500 words) Coursework

Business sector: Healthcare

Spectral analysis of time-series data.... ata collected will be analyzed using the trends of time method of data analysis.... Hence, these statistical data directly shows the growth, as well as expansion of the healthcare sector in United Arab Emirates.... Questionnaires will be prepared, which will be presented to the participants during data collection.... ata collected will be both descriptive and numerical data....
2 Pages (500 words) Essay

Sociological Analysis of Obesity

.... ... ... The Sociology of Obesity2009Obesity is considered to be one of the most serious killer diseases in America.... It has been estimated that as many as three in five Americans are overweight, one in three is obese and 100,000 people die of obesity every The Sociology of Obesity2009Obesity is considered to be one of the most serious killer diseases in America....
8 Pages (2000 words) Essay
sponsored ads
We use cookies to create the best experience for you. Keep on browsing if you are OK with that, or find out how to manage cookies.
Contact Us