Statistical methods Case Study Example | Topics and Well Written Essays

Assignment Part a) A regression is run with yrsed as the response variable on dist, bytest, female, black, hispanic, incomehi, dadcoll, mumcoll and cue80. The regression equation is Table 1 shows the regression equation along with significance of the predictors. Table 1 shows the significance of the estimated regression coefficients along with their standard errors and significance levels as well as the measures of fit. Table 1 Predictor Estimate s.e.(estimate) Significance Intercept 9.005 0.387 *** dist -0.0160 0.025 bytest 0.085 0.006 *** female 0.134 0.101 black 0.281 0.150 * hispanic 0.537 0.145 *** incomehi 0.230 0.121 * dadcoll 0.704 0.149 *** mumcoll 0.461 0.161 *** cue80 0.010 0.018 Measures of fit R2 = 25% Adjusted R2 = 24.4% Std err of regression = 1.58 (b) A simple regression of yrsed is run with only dist as the predictor. The regression equation is The regression equation indicates that years of education complete has a negative correlation with distance from 4 year college. If the distance increases, completed years of education decreases. It indicates that accessibility to a 4 year college plays an important role in furthering education. Table 2 below shows the relevance of the regression equation. Table 2 Predictor Estimate s.e.(estimate) Significance Intercept 14.028 0.074 *** dist -0.086 0.028 *** Measures of fit R2 = 0.9% Adjusted R2 = 0.8% Std err of regression = 1.809 The variable dist is significant at 1% level while it was not previously (see Table 1). Note also that the regression equation is able to explain only 0.9% of the total variability in the data, whereas in the previous model 25% of the variability was explained. The std error or regression also increases. All these point to a significant omitted variable bias. (c) Both dadcoll and mumcoll variables are significant at 1% in predicting yrsed (Table 1) and both are positively correlated with the response. Both these variables are dummy with 1 indicating being college graduate and 0 otherwise. If all the other variables in the regression model remain unchanged, then father being a college graduate is expected to increase years of education in his ward by 0.704 years. If all other variables in the regression model remain unchanged, then mother being a college graduate is expected to increase years of education in his ward by 0.461 years. (d) From Table 1 it is clear that if 1 unit of dist is decreased, all other variables remaining unchanged, years of completed education increase by 0.016 years. Note that the unit of measurement for distance is 10s of miles. Hence 20 mile is equivalent to 2 units. Hence on an average the increase in years completed is 2*0.016 = 0.032 years. The claim of increase by approximately 0.15 year if distance to the nearest college is decreased by 20 miles does not seem tenable. (e) Here two models are to be compared where model 2, the simpler model, in nested within model 1, the full model with all the predictors. The test statistic for comparison is defined as F = [(RSS2 – RSS1)/(df2 – df1)] / RSS1/df1 Where RSSi denotes the residual sum of square of the model i, i = 1, 2 and dfi is the corresponding degrees of freedom. The values of the statistics are as follows RSS2 = 2586.62 RSS1 = 2471.523 df2 = 993 df1 = 990 Value of the F-statistic is F = 38.366 / 2.496 = 15.37 F follows F distribution with 3, 990 df. At 5% level of significance the critical value of F distribution is 2.61. The observed value of F is much larger and hence the value is significant. Model 1 was the fill model but model 2 was a simpler model. The hypothesis was whether model 2 may be used in place of model 1. However, this hypothesis is rejected and the conclusion drawn is taken as a group the variables dadcoll, mumcoll and cue80 may not be eliminated from the model. (f) From Table 1 it is seen that both the regression coefficients corresponding to black and Hispanic are significant, even though significance of black is only at 10% level. In fact its p-value is 6%. Both the coefficients are positive. Hence one may conclude that being Hispanic is expected to increase years of completed education by 0.54 years. Similarly, being black also is expected to increase years of completed education by 0.28 years. Nevertheless, two issues are to be addressed here. Usually significance is considered at 5% or less. As a special case, for model building, significance at 10% may also be considered, but this needs to be pointed out clearly. Second issue is more complicated. The contention is blacks and Hispanics complete more years of college than whites. However, it is not mentioned that there are only three ethnicities considered. The comparison may be among whites, blacks, Hispanics, oriental or other native or mixed ethnicities. Unless such information is given, no definite conclusion may be drawn regarding the contention that blacks and Hispanics complete more years of college than whites. Assignment Part 2 A regression is run with yrsed as the response and female, incomehi and bytest as the predictor variables. The regression equation is (a) To determine whether non-inclusion of mumcoll introduces a omitted variable bias, residual sums of squares of two models, one with mumcoll (Model 1) and the other without mumcoll (Model 2), are compared and the resultant F-statistic is checked for significance. RSS2 = 2623.37 RSS1 = 2566.95 df2 = 996 df1 = 995 Observed value of F is F = [(RSS2 – RSS1)/(df2 – df1)] / RSS1/df1 = 21.87 F-statistic follows F distribution with 1 and 995 degrees of freedom. The above value is significant at 5% level of significance. Hence a conclusion may be drawn that there is indeed an omitted variable bias if mumcoll is not included in the model. (b) The Farrer-Glauber test for multicollinearity uses a chi-square statistic to test for presence of multicollinearity in the data. It also uses the natural logarithm of the determinant of |X’X|. Χ2 = −[T – 1 – 1/6(2p + 5)] log|X’X| Where X is (T x p) design matrix of the regression. Here T = 1000 and p = 3. The above statistic follows a chi-square distribution with ½(p)(p-1) = 3 degrees of freedom. Observed values of Χ2 = - [999 – 1/6(6+5)]*(-0.05) = 49.85 whereas at 5% level of significance the critical value is 7. 81. Hence it may be concluded that there is multicollinearity among the three variables female, incomehi and bytest. (c) Next it requires to be determined which variable is responsible for multucollinearity. Regressing incomehi on female and bytest multiple R2 = 0.033 is obtained. Regressing bytest on female and incomehi R2 = 0.036 is obtained. In the first instance value of test statistic is F1 = [0.033/(1-0.033)][(1000 – 3)/(3-1)] = 17.011 And value of second statistic is F2 = [0.036/(1-0.036)][(1000 – 3)/(3-1)] = 18.616 At (997, 2) df and 5% level of significance, critical value is 19.49. Hence none of these two variables seems to be responsible for multicollinearity. (d) The White test for heteroskedasticity tests whether the error variances are constant for all observations. To do that, the squared residuals are regressed over the predictors, the squared predictors and the cross-product of the predictors. Here two predictors, female and incomehi, are binary. Hence these variables and their squares are identical. Multiple R2 of this auxiliary regression is 0.039. Hence observed value of the test statistic is 1000*0.039 = 39. White’s test statistic follows a chi-square distribution with 7 df. At 5% level of significance critical value is 14.06. Hence null hypothesis of homoskedasticity is rejected. Assignment Part 3 (a) Table 1 : Quarterly GDP Growth Rate Mean 0.7705 % Standard Deviation 0.8795 % Autocorrelation of order 1 34.3% Autocorrelation of order 2 27.4% Autocorrelation of order 3 9.2% Autocorrelation of order 4 8.4% Autocorrelations of quarterly GDP growth rate are unit free. (b) The first order autoregressive model is fitted on quarterly GDP growth rate. The regression equation is The regression model indicates that there is a significant negative dependence of quarterly GDP growth rate on that of the previous quarter. 95% confidence interval for population AR(1) parameter is (− 0.3231, − 0.5739). (c) The second order autoregressive model is fitted on quarterly GDP growth rate. The regression equation is The second order autoregressive model above indicates that quarterly GDP growth rate has significant dependence on that of previous two quarters. Both dependences are negative in nature. That means if other things remaining constant GDP growth rate of previous quarter increases, GDP growth rate of this quarter will decrease. Similarly, if GDP growth rate of the previous second quarter increase, GDP growth rate of this quarter will decrease. AR(2) model is preferred to AR(1) model as the MSE for the former is less than the MSE for the latter. This indicates that the additional parameter relating GDP growth rate in the second previous quarter is instrumental in explaining more variability of the GDP growth rate. (d) AR(1) model for subsample 1: 1974 - 1984 AR(1) model for subsample 2: 1985 – 2009 To test for a break in AR(1) model an F-test comparing the restricted and the unrestricted model is used. Observed value of the F-statistic F =[ [155.968 - (71.1427 + 31.8707)] / 57] /0.8 = 1.16 This follows F-distribution with 57,195 df. The critical value at 5% level of significance is 1.39. Hence the null hypothesis that there no break in AR(1) model at 1984-1985 cannot be rejected. A single model may be used to model the whole data from 1974 to 2009. Read More

Statistical Methods in Math - Case Study Example

Extract of sample "Statistical Methods in Math"

CHECK THESE SAMPLES OF Statistical Methods in Math

Best Practices in improving student success rate in college - developmental/remedial math

Statistics Project

Statistics in a Real-World Context

Terra Nova 4th Grade Math Score

The Statistical Methods of Collecting Data and Information

Descriptive Statistics, Inferential Statistics

SPSS Statistics Project

The Historical Development of Women in Mathematics