StudentShare
Contact Us
Sign In / Sign Up for FREE
Search
Go to advanced search...
Free

The Multiple Linear Regression Model Specification - Essay Example

Cite this document
Summary
This paper 'The Multiple Linear Regression Model Specification' tells us that the coefficient for inflation is given as -0.61945, this means that there is a negative relationship between inflation and vote and as such for every unit increase in the inflation, the dependent variable decreases by a factor of 0.61945 and vice versa…
Download full paper File format: .doc, available for editing
GRAB THE BEST PAPER94.3% of users find it useful
The Multiple Linear Regression Model Specification
Read Text Preview

Extract of sample "The Multiple Linear Regression Model Specification"

DATA ANALYSIS Regress vote on the variables in the Table above. that a constant should be included in the model (denote it as ). Provide the multiple linear regression model specification including the independent variables in the same order as they are presented in the table above. [10 marks] a) Provide an interpretation to the coefficient estimates of that regression. What is the meaning of the estimated intercept in this equation? SOLUTION Inflation: the coefficient for the inflation is given as -0.61945, this means that there is a negative relationship between inflation and vote and as such for every unit increase in the inflation, the dependent variable (vote) decreases by a factor of 0.61945 and vice versa. Growth: the coefficient for growth is given as 0.48663, this means that there is a positive relationship between growth and vote and as such for every unit increase in the growth, the dependent variable (vote) increases by a factor of 0.48663 and vice versa. Goodnews: the coefficient for goodnews is given as 0.64031, this means that there is a positive relationship between goodnews and vote and as such for every unit increase in the goodnews, the dependent variable (vote) increases by a factor of 0.64031 and vice versa. War: the coefficient for the war is given as -2.66658, this means that there is a negative relationship between war and vote and as such for every unit increase in the war, the dependent variable (vote) decreases by a factor of 2.66658 and vice versa. person: the coefficient for person is given as 3.04593, this means that there is a positive relationship between person and vote and as such for every unit increase in the person, the dependent variable (vote) increases by a factor of 3.04593 and vice versa. Estimated intercept (constant): the coefficient for constant is given as 48.7337, this means that holding all other factors constant then the value of vote would be estimated at 48.7337. b) Perform tests for the statistical significance of the parameters of the independent variables inflation, growth and goodnews using the critical value of the corresponding t-distribution and the test p-value. Interpret the tests results. SOLUTION Inflation: the computed t-distribution is given as -1.37 whose |-1.37| is 1.37 a value less than the critical value of 2.0423, meaning that we fail to reject the null hypothesis. Similarly using the p-value we get that the p-value is 0.183>0.05 (significance level), leading us to accept the null hypothesis and thus concluding that the parameter of inflation is insignificant in the model at 5% significance level. growth: the computed t-distribution is given as 3.03 a value greater than the critical value of 2.0423, meaning that we reject the null hypothesis. Similarly using the p-value we get that the p-value is 0.0060.05 (significance level), leading us to accept the null hypothesis and thus concluding that the parameter of goodnews is insignificant in the model at 5% significance level. 2. Perform a joint significance test for the independent variables of the model using both the p-value and the critical value of the F-distribution. [5 marks] a) Comment on the goodness-of-test of the model. What other factors might affect vote? SOLUTION F(5, 25)=5.79, this value is greater than the critical F-value (2.6030) from the tables we thus reject the null hypothesis. Similarly using the p-value we find that the given p-value is 0.0011, a value less than 5% significance level, leading us to reject the null hypothesis too. We thus conclude that the entire model is appropriate and that the independent variables in the model predict the dependent variable (vote). Other factors that might affect vote could be: i) Media coverage of contestants or political parties ii) Age and background of voters iii) Social class (employment and unemployment of people) b) What are the consequences of the results of this F-test together with those of the t-tests (from question 1) for the specification of the model? SOLUTION The F-test evaluates the null hypothesis that all regression coefficients are equal to zero versus the alternative that at least one does not. An equivalent null hypothesis is that R-squared equals zero. A significant F-test indicates that the observed R-squared is reliable, and is not a spurious result of oddities in the data set. Thus, the F-test determines whether the proposed relationship between the response variable and the set of predictors is statistically reliable, and can be useful when the research objective is either prediction or explanation. So in overall, we would say that the model is reliable for predicting the dependent variable (Vote). However, using the t-test we find that four variables (inflation, goodnews, war and person) are insignificant and as such we should consider removing them from the model. 3. Test the hypothesis of that: an extra 0.25% in the real per capita GDP growth rate has double the effect than a decrease of 0.5% in inflation, on vote. [9 marks] a) Use the command available in EViews to test for the corresponding coefficient restriction. b) Perform the test analytically. To obtain the answer to this question, we conducted a marginal effects test c) Interpret the test results. First we observe that the p-value related to the coefficient of growth is 0.000; implying that the marginal effects of the growth are significant at 5% significance level. The dy/dx for growth is close to four times that of inflation, this further confirms that an extra 0.25% in the real per capita GDP growth rate has double the effect than a decrease of 0.5% in inflation, on vote. 4. Answer the sub questions below on multicollinearity analysis in the model. [8 marks] a) Test for multicollinearity between the independent variables growth and inflation in the model. Explain your answer using EViews outputs. Using Klein’s Rule of Thumb, if the value of R2 for the auxiliary regression is greater than that of the original regression, then you probably have multicollinearity. VIF column shows by how much the other coefficients variances (and standard errors) are increased due to the inclusion of that predictor. We see that growth has no impact on the variance of inflation and so there is no multicollinearity between the two variables. b) Assuming that there is multicollinearity between those variables: i) Explain how you would resolve this problem. Explain your answer using EViews outputs. SOLUTION We may be able to resolve the problem on multicollinearity by centering, that is, we subtract the mean from the predictor values before generating the squared term. ii) What the consequences of multicollinearity are for the OLS estimator? SOLUTION The OLS Estimator is Still BLUE In the presence of multicollinearity, the OLS estimator remains unbiased. Also, in the class of the linear unbiased estimators, the OLS estimator remains to have a minimum variance. And as such we cannot find any alternative estimator that is much better than the OLS estimator. However, even though OLS estimator is the “best” estimator, it may not be very good. The Fit of the Sample Regression Equation is Unaffected In the case of OLS estimator, the “overall fit” of the sample regression equation, as measured by the R-Squared statistic, is not affected by the presence of multicollinearity. Thus, if the sole objective of our empirical study is prediction or forecasting, as in this case, then multicollinearity does not matter. The Variances and Standard Errors of the Parameter Estimates Will Increase The worst effect of multicollinearity is that it increases the variances and the standard errors of the OLS estimates. High variance implies that the estimates are not precise, and therefore unreliable to some extent. High variances and standard errors imply low t-statistics. As such, multicollinearity increases the chances of making a type II error of accepting the null-hypothesis when it is false, and therefore concluding that Y is not affected by X when in the real sense it does. That is to say, multicollinearity makes it difficult to detect an effect if one exists. 5. Perform a graphical analysis to detect the presence of heteroscedasticity in the model using at least two different plots. Do you find evidence of heteroscedasticity? Why? Explain the consequences of heteroscedasticity on the OLS estimator. [4 marks] Looking at the plot, we can see that there are no extreme outliers and thus we conclude that there is no evidence of heteroscedasticity. Consequences of heteroscedasticity on the OLS estimator i) The OLS estimators are still unbiased and consistent. This is because none of the independent variables is correlated with the error term. So a correctly specified equation will give us values of estimated coefficient which are very close to the real parameters. ii) Heteroscedasticity affects the distribution of the estimated coefficients increasing the variances of the distributions and therefore making the OLS estimators inefficient; that is, it is not BLUE. iii) Heteroscedasticity underestimates the variances of the estimators (The estimated variances and covariances of the OLS estimates are biased and inconsistent), leading to higher values of t and F statistics. iv) Hypothesis tests are not valid. 6. Perform a White test for heteroscedasticity. [8 marks] a) Provide the auxiliary regression and explain the meaning of the null hypothesis for this test. SOLUTION We evaluate the null hypothesis using the p-value; the null hypothesis in this case is for homoscedasticity and the heteroscedasticity as the alternative. P-value is given as 0.6834>0.05 (significance level) we thus fail to reject the null hypothesis and conclude that there is no presence of heteroscedasticity in the data. b) Why is the White test preferred to the Breusch-Pagan and Goldfeld-Quandt tests for heteroscedasticity. Explain your answer. SOLUTION The White test is a general test for heteroscedasticity and has the following advantages over the two other tests (Breusch-Pagan and Goldfeld-Quandt): i White test does not require one to specify any model of the structure of the heteroscedasticity, if at all it exists. ii White test does not depend on the many assumptiona that the errors are normally distributed which are common with the two tests (Breusch-Pagan and Goldfeld-Quandt). iii White test specifically tests whether the presence of heteroscedasticity causes the OLS formula for the variances and the covariances of the estimates to be incorrect. Lastly, Breusch-Pagan works well if linear forms but not for non-linear forms while Goldfeld-Quandt is more complex and inflexible to use as compared to white test. 7. Assume that there is heteroscedasticity of the form: How would you resolve the problem of heteroscedasticity in this case? Explain your answer analytically.[4 marks] SOLUTION And since It is therefore obvious that We now put all t elements in matrices and obtain estimates of alpha for the model by solving 8. Estimate the model using White’s autocorrelation and heteroscedasticity consistent standard errors. Comment on the results of that estimation in relation to the estimation results in question 1. When do we use White standard errors? [5 marks] SOLUTION The table above gives the White’s autocorrelation test. P-values are greater than 5% significance level leading us to fail in rejecting the null hypothesis of serially uncorrelated hence there is no presence of autocorrelation in the model. The p-value is given as 0.9772 (a value greater than 5% significance level), we thus fail to reject the null hypothesis and conclude that there is no presence of heteroscedasticity in the model. We should use White standard errors when we detect the presence of heteroscedasticity in the model. 9. Provide a graphical analysis of the residuals to detect the presence of autocorrelation using at least two different plots. What are the consequences of autocorrelation on the OLS estimator? [4 marks] The graphs clearly shows that the is no autocorrelation in the model specified. Consequences of autocorrelation on the OLS estimator; i) OLS estimators remain to be unbiased and linear ii) The property of minimum variance no longer exists in the presence of autocorrelation iii) The usual formulas for estimating variances are biased, that is, they can have negative or positive autocorrelation iv) Confidence intervals and hypothesis tests based on t and F-distributions are unreliable v) Since is affected so does vi) Computed standard errors and variances of forecasts might be inaccurate 10. Test for autocorrelation in the residuals using an appropriate procedure. [4 marks] SOLUTION We tested for autocorrelation using Durbin Watson test; If the observed value of the test statistic is greater than the tabulated upper bound, then we should fail to reject the null hypothesis of non-autocorrelated errors in favor of the hypothesis of positive first-order autocorrelation. Since 2.375817 is greater than 1.920, we fail to reject the null hypothesis and conclude that the errors in the model are non-autocorrelated. 11. All other factors being equal, is there evidence on that the incumbent running for the election is determinant for the percentage share of the vote won by the incumbent party? How strong is the evidence? Show all steps of the corresponding test to answer this question. [8 marks] SOLUTION We ran a correlation test to identify whether there exists evidence of correlation between vote and person. The Pearson correlation coefficient is given as 0.2746, showing that there is evidence that the incumbent running for the election is determinant for the percentage share of the vote won by the incumbent party. However, the strength is weak (though positive). 12. Describe step by step how you could test for the best functional form for the model in Question 1. [9 marks] SOLUTION We use the Ramsey RESET Test; Ramsey argued that various specification errors (omitted variables, incorrect functional form, correlation between X and U) gives rise to a nonzero U vector. The null and alternative hypotheses are; vs The test of is based on an augmented regression The test for specification error is then, . Ramsey’s suggestion is that Z should contain powers of the predicted values of the dependent variable. Using the second, third, and fourth powers gives Where and etc The first power,, is not included since it is an exact linear combination of the columns of X. Its inclusion would make the regressor matrix [X Z] have less than full rank. Based on the table above, we observe that the p-value is 0.3897>0.05 (significance level) we thus fail to reject the null hypothesis and conclude that the model has no omitted variables. 13. Test the assumption of normality in the residuals of the selected model in question 1 by using the Jarque-Bera (JB) tests. Comment on the implications of your JB test results on the properties of the OLS estimator. [6 marks] SOLUTION The above results indicate that the p-value is greater than 5% we thus fail to reject the null hypothesis and conclude that the residuals in the model follow a normal distribution. 14. For what purpose can your analysis above be used by political parties in general elections? Explain your answer. [6 marks] SOLUTION The above analysis can be used by political parties to lay down strategies on how to win the elections knowing very well that close to 54% (R2=0.5365) of variation in votes is explained by the five independent variables in the model. The analysis will help them improve on the factors affecting voting pattern Read More
Cite this document
  • APA
  • MLA
  • CHICAGO
(“Coursework for Data Analysis Essay Example | Topics and Well Written Essays - 2750 words”, n.d.)
Coursework for Data Analysis Essay Example | Topics and Well Written Essays - 2750 words. Retrieved from https://studentshare.org/miscellaneous/1666367-coursework-for-data-analysis
(Coursework for Data Analysis Essay Example | Topics and Well Written Essays - 2750 Words)
Coursework for Data Analysis Essay Example | Topics and Well Written Essays - 2750 Words. https://studentshare.org/miscellaneous/1666367-coursework-for-data-analysis.
“Coursework for Data Analysis Essay Example | Topics and Well Written Essays - 2750 Words”, n.d. https://studentshare.org/miscellaneous/1666367-coursework-for-data-analysis.
  • Cited: 0 times

CHECK THESE SAMPLES OF The Multiple Linear Regression Model Specification

Multiple Regression

Table 2: Testing for presence of multi-colinearity, VIFs Now, we rerun the regression incorporating only the significant variables in the specification.... The model is specified as follows: The results of this regression are presented in table 3.... Data and methodology By using multiple regression analysis, the report is in pursuit of obtaining statistical evidence for or against commonly held beliefs regarding causality of various factors and crime....
6 Pages (1500 words) Essay

The Associated Importance of the Linear Models

In this case it has been seen that the major problem that is faced by the researcher in using the linear model is the model specification and this has been defined as the discontinuity design in case of the regression analysis. ... In this case it has been realized that the deep understanding is required in case of the linear regression models as it has been used in the wide variety of research but still a greater and a wide perspective of research is needed....
7 Pages (1750 words) Essay

A State-Wise Empirical Investigation of The Income-Demand Relationship

is there a linear dependence or is there in fact a non-linear relationship among our variables of interest?... The paper gives detailed information about A STATE-WISE EMPIRICAL INVESTIGATION of the INCOME-DEMAND RELATIONSHIP.... Using state-wise data on average income levels and cost of living index values, the paper identifies a positive dependence of the price index on the average household....
6 Pages (1500 words) Research Paper

Microeconomic Theory of Production Design

Many models of growth along with development suppose that the end result is generated with a two-component, Cobb-Douglas specification for your aggregate production function using physical capital and work or Man capital adjusted labor helping as inputs.... The Cobb-Douglas specification could be the only linearly homogenous production function that has a constant elasticity of substitution in which each factor's Share of income is constant over time.... The linear homogeneity along with constant elasticity of alternative properties with the Cobb-Douglas specification also can explain this popularity with this functional style (Duffy & Papageorgiou 2000, p....
8 Pages (2000 words) Essay

Generic Business Strategies and Advantage of Tourist Companies

The author explains that the multiple regression methods employed linear regression to test the relationship between the competitive advantage of the tourist companies and the three major generic strategies acting as the independent variables.... The accuracy of prediction of the multiple regressions model is measured by the magnitude of R2 as well as the statistical significance of the entire model.... The regression linear model indicates the expected relationship between the dependent and predictor variables....
8 Pages (2000 words) Statistics Project

Line of Best Fit Squares Regression LIne

n multiple linear regression models, the variable is influenced by many factors.... The author of the "Line of Best Fit Squares Regression LIne" paper gives the understanding of the line of best fit and its approach to linear regression, where and how they are applied with examples and the different models of regression with uses and purposes.... Such a line of best fit for the given distribution is called the linear regression.... n general, the feature of linear regression is to find the line that best predicts y from x or the line that predicts x from y, linear regression does this by finding the line that minimizes the sum of the squares of the vertical distances of the points from the line....
10 Pages (2500 words) Coursework

A Heteroscedastic Regression Model for Survival Analysis

"Heteroscedastic regression model for Survival Analysis" paper studies the rates of survival for specific cancer suitable covariates relevant to that cancer identified.... One ought to use linear combinations of such covariates with coefficients as in the multiple regressions.... Statistically, the specifications of a model require choosing both systematic and error components.... The choice of an error component involves specifying the statistical distribution of what remains for an explanation after the model is fit....
8 Pages (2000 words) Statistics Project

Flood Mathematical Models That Are Used In Flood Modelling

"Flood Mathematical Models That Are Used In Flood Modelling" paper argues that the use of hydrographs and the data provided on the Nash model provided information that is easy to extrapolated flood plain on base flow.... The Autoregressive Conditional Heteroscedastic and Generalized Autoregressive Conditional Heteroscedastic model are expected to be more accurate as they incorporate many parameters as compared to the Nash model although it is considered versatile and accurate....
16 Pages (4000 words) Statistics Project
sponsored ads
We use cookies to create the best experience for you. Keep on browsing if you are OK with that, or find out how to manage cookies.
Contact Us