StudentShare
Contact Us
Sign In / Sign Up for FREE
Search
Go to advanced search...
Free

Description of Step-Wise Multiple Regression statistic test - Essay Example

Cite this document
Summary
Introduction
Stepwise regression is a partial-automated process of constructing a model by sequentially including or eliminating variables based only on the t-statistics values of their predicted coefficients. …
Download full paper File format: .doc, available for editing
GRAB THE BEST PAPER91.3% of users find it useful

Extract of sample "Description of Step-Wise Multiple Regression statistic test"

Of Step-Wise Multiple Regression Statistic Test Inserts His/Her Inserts Grade Inserts Writer Inserts Date Here (Day, Month, Year) Introduction Stepwise regression is a partial-automated process of constructing a model by sequentially including or eliminating variables based only on the t-statistics values of their predicted coefficients. It is specifically functional for straining through large amounts of possible independent variables and/or calibrating a model by thrusting variables in or out. If it is not utilized properly, it may congregate on a wretched model while contributing a false sensation of security. This paper attempts to review in detail the step-wise regression model and its application through SPSS version 21. Definition and Detailed Description of 'Stepwise Regression' According to Investopedia, Step-wise regression is a step-by-step iterative establishment of a regression model that necessitates automatic excerption of independent variables. Stepwise regression can be accomplished either by testing single independent variable at one time and admitting it in the regression model if it is found to be statistically significant, or by admitting all possible independent variables within the model and eradicating those that are found to be statistically insignificant, or by a amalgamation of both methods (Investopedia US, A Division of ValueClick, Inc., 2012). Stepwise multiple regressions provide a way of selecting predictors of a specific dependent variable on the grounds of statistical criteria. Necessarily the statistical methodology determines amongst the various independent variables which one is the most suitable predictor, the more suitable predictor and so the process goes on. The emphasis is on exploring the most suitable predictors at every stage. When predictors are found to be extremely correlated with one another as well as with the dependent variable, frequently one variable turns numbered as a predictor while others are eliminated. It does not shows that the eliminated variable necessarily is not a predictor but that it adds nothing to the estimation done by the first predictor. At times the most suitable predictor is merely marginally better than the subsequent predictor and small variations within the procedures may influence which amongst the two is selected as the predictor. There are various multiple regression variants. Stepwise regression is generally a good option although all variables can be entered simultaneously as a substitute. Similarly, all variables can be entered once and then the predictors are eliminated by and by if elimination does not bring about big changes in the entire prediction. Stepwise regression, in statistics entails regression models within which the selection of predictive variables is drawn out by an automatic process. Ordinarily, this assumes the configuration of a succession of F-tests, but other proficiencies are potential, such as adjusted R-square, t-tests, Akaike criterion, Mallows' Cp, Bayesian criterion or false discovery rate (Draper and Smith, 1981). Principal approaches The major approaches utilized in the step-wise regression model are forward selection, backward elimination and bi directional elimination. Forward selection involves commencing without any variable within the model, examining the inclusion of individual variable utilizing a selected model equivalence criterion, including the variable if any present amongst the various predictors that enhances the model to the best, and iterating this process till none amends the model. Backward elimination involves commencing with all potential variables, examining the exclusion of every variable utilizing a selected model equivalence criterion, eliminating the variable if any present amongst the various independent variables that leads to improvement in the model upon elimination and iterating the process until no more improvement is possible. Bidirectional elimination is a combination of the forward selection and backward elimination entailing examining the models at each and every level. A widely employed algorithm was initially proposed by Efroymson in 1960. It is an automatic process for statistical model choice in situations where there is a multitude of possible explanatory variables, and it lacks any underlying theory to form the basis of the selected model. It may be referred as a variation in forward selection. At every step within the process, following a new variable addition, a test is employed to ensure if certain variables may be eliminated without noticeably enhancing the RSS (residual sum of squares). This procedure ends when the variable is locally maximized, or when the provided improvement falls behind some critical value. Selection criterion One of the major issues with stepwise regression is that it explores a large area of potential models. Therefore, it is more likely to over fit the data. Conversely, step-wise regression more often fits better in sample data rather than novel data obtained from out of sample. This issue can be resolved if the criterion for including or eliminating a variable is substantiating enough (Hocking, 1976). Model accurateness To test the accuracy of the regression models incorporated by step-wise regression, is not to reckon merely on the model's F-statistic, multiplier-r, or significance, but rather assess the model against a data set that was not employed to develop the model. This is generally accomplished by constructing a model on the basis of available data set for instance 70% of the available sample, of and utilizes the rest of 30% data to examine the accuracy of the entire model. Accuracy is then occasionally calculated as the actual SE (standard error), mean error amongst the estimated values and the true values in the given sample or MAPE. Technique is especially valuable when data is gathered in various settings for instance, social, time or when models are presumed to be extrapolated (Mayers & Forgy, 1963). Step-wise regression and its applications The stepwise multiple regression techniques is a procedure of choosing an independent variable often referred as the predictor variable in order to establish the straight line association with the given dependent variable often referred as the predicted variable. Those independent variables, which exhibit extreme correlation values with the provided dependent variable, are anticipated to have priority in entering into the linear equation, which is outside the theoretical assumptions or the dominance of the researcher. On a regular basis, the researcher will employ this method while conducting an exploratory research. The model is predicted to be suitable for large data set involving more than 100 of independent variables and the sample size suitable for choosing step-wise regression analysis is usually the fifty times the degree of freedom of the sample. It may be employed for instance in such research programs where the theory is lacking to back the research or the research is designed to explore a novel avenue of any subject whether sociological topic, psychological issue, marketing problems or financial performance (Barry, 2012). Consider an example of engineering, sufficiency and necessity are generally determined by F-tests. For additional thoughtfulness, while planning a computer simulation, or an experiment, or a scientific survey to gather data for multiple regression models, the numbers of parameters are necessitated to be considered such as P, to forecast and accommodate the sample size accordingly. Consider the situation of a scientist who wants to explore the causes of rising obesity in the given region. The variables considered may include the increased fast food restaurants, the daily calorie intake, the level of daily exercise, the income level, age etc. The sample will be drawn and the data will be collected accordingly. Now in order to understand the model, step-wise regression method can be employed as the most predictive factor is not known prior to the research and the least predictive factor is also not known. So utilizing step-wise regression will ease the process and help in the determination of most significant causes of the desired dependent variable. SPSS and Step-Wise Multiple Regression Analysis SPSS provides a variety of statistical analysis and easy data entry modes. In order to perform step-wise multiple regression analysis, an online data sample was used. The number of data points was exceeded to 25 by randomly assigning the values to each variable. The data comprised of a dependent variable namely ‘Heat Costing’ while three predictors or independent variables namely, ‘Mean Outside Temperature’ labeled as X1, Attic Insulation labeled as X2 and Age of Furnace labeled as X3. Each variable was provided a numerical value in its respective units. Now in order to fully comprehend the data analysis using SPSS tools, first the entry of data is needed. For this purpose as soon as the SPSS Software is clicked to start, a window appears that offers various ways to enter data either by typing, by exploring existing databases or by creating new databases. The first step employed can be evident in the mentioned below. As evident from the above mentioned screen-shot, the method chosen for data entry is by typing in the values. Upon selection, the OK button is clicked and the new window opens and that offers the data entry tables. There are two tabs mentioned in the data entry window, one is named data view while the other one is variable view. This is evident in the picture below that shows the variable view window. This screen-shot shows that the variables have been entered in the variable view window. Once the variables have been entered the next step includes entering the appropriate sample value for each variable. The above screen-shot depicts the entry of appropriate data point against each variable under consideration. Once the data has been correctly entered the next step involves analysis. For this purpose, the ribbon above the data window has several tabs and one of these tabs is named as ‘Analyze’. This particular tab is employed for data analysis purpose. Since we’ve to do multiple regression analysis, the “Analyze’ tab is clicked and thus a drop down menu opens from which ‘Regression’ is selected which further opens another down menu and then “Linear’ is selected. All these activities are evident in the screen shot mentioned below. Then upon selecting the linear regression model, the following window opens providing options for variables designation as well as the method selection for conducting regression analysis. As shown in the above screen shot, the variables are appropriately mentioned in their respective options and then the options category is selected. Upon clicking it a new window opens up shown in the screenshot mentioned below. As evident in the above screen shot the inclusion and exclusion criteria are set in this dialogue box. The inclusion criteria is level of significance= 0.05 while exclusion criteria is level of significance =0.10. Upon selecting the required options the continue option is clicked and thus an output is generated. SPSS Output and Its Interpretation REGRESSION /DESCRIPTIVES MEAN STDDEV CORR SIG N /MISSING LISTWISE /STATISTICS COEFF OUTS CI(95) R ANOVA COLLIN TOL ZPP /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT HeatingCost /METHOD=STEPWISE MeanOutsideTemp AtticInsulation AgeofFurnace Descriptive Statistics Mean Std. Deviation N Y 209.0400 106.18164 25 X1 37.0000 16.18641 25 X2 6.6000 2.34521 25 X3 7.2800 3.23419 25 The first table shows descriptive statistics for the sample under consideration. It can be seen that the mean value of dependent variable “Heat Costing” is found to be equal to $209, while the mean value of X1, “Mean Outside Temperature” is found to be equal to 37oF, the mean value of X2, “Attic Insulation” is found to be equal to 6.6 inches and that of X3, “Age of Furnace” is found to be equal to 7.28 years. The standard deviations for each variable i.e. Y, X1, X2 and X3 are found to be 106.18, 16.19, 2.345 and 3.234 respectively. The sample size as depicted by “N” is equal to 25. Correlations Y X1 X2 X3 Pearson Correlation Y 1.000 -.626 -.230 .484 X1 -.626 1.000 -.102 -.424 X2 -.230 -.102 1.000 .015 X3 .484 -.424 .015 1.000 Sig. (1-tailed) Y . .000 .134 .007 X1 .000 . .314 .017 X2 .134 .314 . .471 X3 .007 .017 .471 . N Y 25 25 25 25 X1 25 25 25 25 X2 25 25 25 25 X3 25 25 25 25 The second table shows the correlation matrix. Correlation of each variable with respect to each of the variable considered in the model is give. According to Pearson Correlation Coefficient, Y is found to be negatively correlated with each of the given predicting variables, while the relationship is most significant between Y and X1. Similarly X1 is also found to be negatively correlated with each of the considered variables. X2 shows variation as it is negatively associated with Y and X1 while positively correlated with X3. The variable X3 is also negatively correlated with variable X1. ANOVAa Model Sum of Squares Df Mean Square F Sig. 1 Regression 105875.191 1 105875.191 14.784 .001b Residual 164713.769 23 7161.468 Total 270588.960 24 a. Dependent Variable: Y b. Predictors: (Constant), X1 The third table contains ANOVA “Analysis of Variance” summary, showing that the model is statistically significant at ?=0.05. The fifth table shows the summary of correlation coefficients, standardized and unstandardized coefficients as well as the slope of the estimated line. Excluded Variablesa Model Beta In T Sig. Partial Correlation Collinearity Statistics Tolerance VIF Minimum Tolerance 1 X2 -.297b -1.919 .068 -.379 .990 1.011 .990 X3 .266b 1.525 .141 .309 .820 1.219 .820 a. Dependent Variable: Y b. Predictors in the Model: (Constant), X1 The sixth table shows the summary of excluded variables as shown to be statistically insignificant. Summary of results The regression analysis conducted using a sample obtained from an online available power point presentation. The variables involved were heat costing being the dependent variable while Mean Outside Temperature, Attic Insulation and Age of Furnace being the independent variables. Since the relevancy of selected variables was not known, hence the step-wise method was chosen to conduct the multiple regression analysis on SPSS version 21. The output revealed various tables including a table of descriptive statistics, Correlation matrix, ANOVA, the model chosen by excluding the variables that didn’t meet the inclusion criteria and lastly the excluded variables statistics i.e. Attic Insulation and Age of Furnance. Criticism Stepwise regression processes are employed in data mining, but are found to be controversial. Various points of critique have been constructed. A succession of F-tests is frequently utilized to control the exclusion or inclusion of variables, but these tests are performed on the same data and hence there will be troubles of multiple equivalences for which various correction criteria have been constructed. It is unmanageable to understand the p-values linked with these tests, as each is contingent on the prior tests of exclusion and inclusion. The tests performed are themselves biased, because they are conducted on the same data. Wilkinson and Dallal in 1981 calculated employing simulation the multiple correlation coefficients’ percentage points and demonstrated that a concluding regression accomplished by forward selection, submitted by the F-test to be significant at 0.001 significance level but was found to be significant merely at 0.05. It is essential to regard the degrees of freedom numbers that have been employed in the complete model, not merely includes the independent variables number in the consequence fit. Models employed may show a little bit variation by being too small that the actual world models. Critics consider the procedure as a paradigmatic instance of intense computation, data dredging usually being an unequal replacement for subject area expertness (Roecker, 1991). Conclusion The Step-Wise multiple regression method offers the ease of dealing with multiple predictor of a given dependent variables. Usually the data as well as the number of predictors is very large and hence step-wise regression analysis employing its exclusion and inclusion criteria avoids the least statistically significant independent variables from the model. According to few statisticians, Stepwise regression has a numeral of drawbacks. These entail an inherent bias within the process itself, incorrect results, and the requirement for significant calculating power to formulate composite regression models via iteration. References Barry, C. (2012) Using stepwise regression to explain plant energy usage. Real-World Quality Improvement. Available at http://blog.minitab.com/blog/real-world-quality-improvement/using-minitab-stepwise-regression-to-explain-plant-energy-usage [Accessed 30th Nov 2012]. Draper, N. and Smith, H. (1981) Applied Regression Analysis, 2nd Edition, New York: John Wiley & Sons, Inc. Efroymson, M. A. (1960) "Multiple regression analysis." In Ralston, A. and Wilf, H. S. (eds.), Mathematical Methods for Digital Computers. Wiley. Hocking, R. R. (1976) "The analysis and selection of variables in linear regression," Biometrics :32. Investopedia US, A Division of ValueClick, Inc. (2012) Definition of 'Stepwise Regression.' Available at http://www.investopedia.com/terms/s/stepwise-regression.asp#axzz2DmhCElwr [Accessed 30th Nov 2012] Mayers, J. H. & Forgy E. W. (1963) The development of numerical credit evaluation systems. Journal of the American Statistical Association 58(303): 799–806. Roecker, E. B. (1991) “Prediction error and its estimation for subset—selected models.” Technometrics 33: 459–468. Wilkinson, L. and Dallal, G.E. (1981) "Tests of significance in forward selection regression with an f-to enter stopping rule." Technometrics 23. 377–380. Read More
Cite this document
  • APA
  • MLA
  • CHICAGO
(“Description of Step-Wise Multiple Regression statistic test Essay”, n.d.)
Retrieved from https://studentshare.org/statistics/1402781-description-of-step-wise-multiple-regression-statistic-test
(Description of Step-Wise Multiple Regression Statistic Test Essay)
https://studentshare.org/statistics/1402781-description-of-step-wise-multiple-regression-statistic-test.
“Description of Step-Wise Multiple Regression Statistic Test Essay”, n.d. https://studentshare.org/statistics/1402781-description-of-step-wise-multiple-regression-statistic-test.
  • Cited: 0 times

CHECK THESE SAMPLES OF Description of Step-Wise Multiple Regression statistic test

Statistical Analysis of Bank in New Jersey

The importance of this study is to test whether banks serve their own communities.... The slope coefficient of regression of 35.... The intercept coefficient of regression of 2082 suggests that for no minority population in the county, the number of people per bank branch will be about 2,082 on average....
5 Pages (1250 words) Statistics Project

Statistic paper

Descriptive analysis, correlation and regression analyses were conducted using Minitab A significant correlation was observed between days and charges, r(287) = .... Simple and multiple regressions yielded two feasible linear models: (a) Charges = 930.... Consequently, this would assist in determining appropriate physicians for the hospital....
6 Pages (1500 words) Statistics Project

Statistics Quiz Multiple choice

n the context of a hypothesis test for two proportions, which of the following statements about the pooled sample proportion, ,  true?... 5 to test the claim that p1 = p2, Use the given sample sizes and numbers of successes to find the pooled estimate .... 5 to test the claim that  Find the critical value(s) for this hypothesis test.... If the researcher had used an independent-measures t statistic to evaluate the data, what value would be obtained for the t statistic?...
2 Pages (500 words) Statistics Project

Statistics Assignment

Since for applying Chi Square test each cell expected frequency should be minimum 5 to maintain But this is violated.... For one tailed test, the critical value of Z at 1% level is 2.... More over the t statistic value for the regression coefficient is 0....
5 Pages (1250 words) Statistics Project

Inferential Statistics

0 or 80% and we reject Reject H0 if p-value ≤ α, where α is the level of significance for the test (David R.... The independent variables are App, Graphical User Interface (GUI) and functionality.... The study evaluates the consumer decisions in buying a smartphone in particular the set of attributes on the software (Del I....
6 Pages (1500 words) Statistics Project

Multiple Regression Analysis

Descriptive statistics and regression analysis are used to analyze the data.... Existing literature suggests that factors such as labor force, foreign direct investment, and foreign trade influence economic growth.... This report investigated relationship between economic… Using data from the Central Intelligence Agency and Excel for data analysis, the report identifies positive effects of labor force and imports and negative effects of foreign direct investments inflow and export on Governments' focus on improving labor force and imports are recommended for economic growth....
3 Pages (750 words) Statistics Project

Descriptive statistics

Based on the descriptive statistics given in tasks 1 through 6 above, it is clear that there exists a significant difference in the pre-test and post-test for the Eccentric viewing (EV).... Out of the 12 people, 11 people improved while only one person never improved, this clearly… Based on the descriptive statistics given in tasks 1 through 4 above, it is clear that there exists a significant difference in the average percentage of Consultations in which Trainees The mean for the experimental post-workshop was found to be 30 while that of control post-workshop was established to be 16....
4 Pages (1000 words) Statistics Project

Regression Analysis

The multiple regression analysis estimates the coefficients of the linear equation especially in the cases where more than one independent variable exists.... The author of the essay "regression Analysis" casts light on the concept of regression.... It is mentioned here that regression analysis estimates the extent to which two or more variables are related.... hellip; In this work, the focus is to employ various aspects of regression and correlation analysis to be able to explore how imports and export ratios affect the GDP of the UAE....
3 Pages (750 words) Statistics Project
sponsored ads
We use cookies to create the best experience for you. Keep on browsing if you are OK with that, or find out how to manage cookies.
Contact Us