Statistical Dataset Variables Significance Essay Example | Topics and Well Written Essays - 1500 words

Statistics Essay Calculate the pair-wise correlation coefficients between sales per square meter and each of the other variables and test their statistical significance. Produce scatter plots for each pair of variables. Provide a written interpretation for each of the correlation coefficients and the related scatter plots. SOLUTION The Pearson correlation coefficient is one of the most commonly used measure for determining the strength of relationship that exists between two variables; however the value of the Pearson coefficient does not in any way generally or rather totally indicate the relationship that exists between the two variables (Mahdavi & Babak , 2012). The coefficient values ranges from -1 to +1, any value close to +1 means that there is a strong positive linear correlation while a value of -1 means a perfectly negative linear correlation, a positive value close to zero indicates that there is a weak positive correlation (Székely , et al., 2007). On the other hand, a value close to -1 means that the variables have a strong negative correlation, a negative value close to zero implies a weak negative correlation (Mahdavi , 2013). A coefficient of zero value means that there is no any correlation that exists between the two variables (Nikolić, et al., 2012) Correlations Sales per square metre Number of full-timers Number of part-timers Total number of hours worked Sales floor space of the store (in square metres) Sales per square metre Pearson Correlation 1 .237** .050 .263** -.294** Sig. (2-tailed) .000 .318 .000 .000 N 400 400 400 400 400 Number of full-timers Pearson Correlation .237** 1 .289** .531** .350** Sig. (2-tailed) .000 .000 .000 .000 N 400 400 400 400 400 Number of part-timers Pearson Correlation .050 .289** 1 .249** .366** Sig. (2-tailed) .318 .000 .000 .000 N 400 400 400 400 400 Total number of hours worked Pearson Correlation .263** .531** .249** 1 .576** Sig. (2-tailed) .000 .000 .000 .000 N 400 400 400 400 400 Sales floor space of the store (in square metres) Pearson Correlation -.294** .350** .366** .576** 1 Sig. (2-tailed) .000 .000 .000 .000 N 400 400 400 400 400 **. Correlation is significant at the 0.01 level (2-tailed). Scatterplots Interpretation for each of the correlation coefficients i) Sales per square metre versus number of full-timers The coefficient of correlation is given as 0.237; this represents a weak positive relationship however, the p-value is 0.000 (a value less than α=0.05), meaning that the relationship is significant at 5% significance level. ii) Sales per square metre versus number of part-timers The coefficient of correlation is given as 0.05; this represents a weak positive relationship however, the p-value is 0.318 (a value greater than α=0.05), meaning that the relationship is not significant at 5% significance level. iii) Sales per square metre versus number of hours worked The coefficient of correlation is given as 0.263; this represents a weak positive relationship however, the p-value is 0.000 (a value less than α=0.05), meaning that the relationship is significant at 5% significance level. iv) Sales per square metre versus Sales floor space of the store The coefficient of correlation is given as -0.294; this represents a weak negative relationship however, the p-value is 0.000 (a value less than α=0.05), meaning that the relationship is significant at 5% significance level. v) Number of full-timers versus number of part-timers The coefficient of correlation is given as 0.289; this represents a weak positive relationship however, the p-value is 0.000 (a value less than α=0.05), meaning that the relationship is statistically significant at 5% significance level. vi) Number of full-timers versus total number of hours worked The coefficient of correlation is given as 0.531; this represents a moderately strong positive relationship however, the p-value is 0.000 (a value less than α=0.05), meaning that the relationship is statistically significant at 5% significance level. vii) Number of full-timers versus Sales floor space of the store (in square metres) The coefficient of correlation is given as 0.350; this represents a weak positive relationship however, the p-value is 0.000 (a value less than α=0.05), meaning that the relationship is statistically significant at 5% significance level. viii) Number of part-timers versus total number of hours worked The coefficient of correlation is given as 0.249; this represents a weak positive relationship however, the p-value is 0.000 (a value less than α=0.05), meaning that the relationship is statistically significant at 5% significance level. ix) Number of part-timers versus Sales floor space of the store (in square metres) The coefficient of correlation is given as 0.366; this represents a weak positive relationship however, the p-value is 0.000 (a value less than α=0.05), meaning that the relationship is statistically significant at 5% significance level. x) Total number of hours worked versus Sales floor space of the store (in square metres) The coefficient of correlation is given as 0.576; this represents a moderately strong positive relationship however, the p-value is 0.000 (a value less than α=0.05), meaning that the relationship is statistically significant at 5% significance level. 2. Write down an equation representing a linear regression model in which sales per square metre depend on a constant, the total number of hours worked and floor space of the store (in square metres). Estimate the equation, report the results, and comment on the overall goodness of the model. SOLUTION In this case we run an OLS regression analysis. This is a statistical process that involves the modelling and estimation of any relationship (either positive or negative) among various variables (Kutner, et al., 2004); this may include various techniques that models and analyses two or more variables with a focus on the relationship between the response variable and one or more explanatory variables (Lindley, 1987). It very crucial to check the goodness of fit for the estimated model and the statistical significance of the estimated parameters in the model once we have constructed the regression model (Fotheringham, et al., 2002). We checked the statistical significance by using F-test for the overall fit; we then followed it by t-tests for the individual parameters (Galton, 1989). Estimated of the equation: Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate 1 .605a .366 .363 2985.371 a. Predictors: (Constant), Sales floor space of the store (in square metres) , Total number of hours worked Based on the results, the equation is given as; ANOVAb Model Sum of Squares df Mean Square F Sig. 1 Regression 2.041E9 2 1.020E9 114.495 .000a Residual 3.538E9 397 8912441.334 Total 5.579E9 399 a. Predictors: (Constant), Sales floor space of the store (in square metres) , Total number of hours worked b. Dependent Variable: Sales per square metre The Adjusted R square value is 0.363. The regression is significant at 5% significance level since we observe the p-value to be 0.000 (a value less than α=0.05) leading to rejection of the null hypothesis and concluding that indeed the regression is significant at 5% significance level and that the overall model is fit and appropriate. R-square is 0.366 implying that 36.6% of variation in the dependent variable (sales) is explained by the two independent variables in the model. 3. Interpret the estimated coefficients from an economic perspective and comment on their statistical significance. SOLUTION Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) 5133.590 321.693 15.958 .000 Total number of hours worked 37.528 2.837 .647 13.227 .000 Sales floor space of the store (in square metres) -22.145 1.625 -.666 -13.627 .000 a. Dependent Variable: Sales per square metre The coefficient of number of hours (hoursw) is 37.528; this indicates that for any unit change in hoursw, the dependent variable (sales) changes by 37.528. That is to say, if hoursw level increases by one unit then we would expect the sales to increase by 0.719 and vice versa. The coefficient for ssize is -22.145 implying that for any unit change (increase) in the ssize, the dependent variable (sales) decreases by 22.145. 4. Does the inclusion of the number of full-timers and part-timers significantly improve the model? SOLUTION The value of adjusted R-squared when the number of full-timers and part-timers are included in the model is 0.396 while the same value of R-squared when the number of full-timers and part-timers are not included is 0.363; this shows that inclusion number of full-timers and part-timers increases the value of adjusted R-squared. It is therefore clear that inclusion of the number of full-timers and part-timers significantly improves the model. Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate 1 .634a .402 .396 2905.818 a. Predictors: (Constant), Number of part-timers, Total number of hours worked, Number of full-timers, Sales floor space of the store (in square metres) Conclusions In this paper we report on the correlations existing between the dependent variable (sales) and the explanatory variables. We also report on the multiple regressions analysis. From the results, it clear that sales have relationship with the four explanatory variables given, however out of the four variables three had statistically significant correlation with the dependent variable. Also out of the three variables with statistically significant correlation, two had positive linear relationship while one explanatory variable had negative linear relationship with the dependent variable. In regard to regression, the analysis shows that the explanatory variables in the model do not fully explain the variation in the dependent variable; only 36.6% of variation in the dependent variable is explained by the explanatory variables in the model. This shows that a large chunk of the variation is explained by the variables not included in the model (error term). Lastly, we observe that including the number of full-timers and part-timers slightly improves the model. For instance, without the number of full-timers and part-timers in the model we have adjusted R-Squared as 0.363 but when we include the two explanatory variables (number of full-timers and part-timers), the value of adjusted R-Squared increases to 0.396. This shows that two relevant variables had been excluded in the model. It is therefore prudent that researchers should critically evaluate the variables affecting the dependent variable in detail before settling for a model. Works Cited Fotheringham, A. S., Brunsdon, C. & Charlton, M., 2002. Geographically weighted regression: the analysis of spatially varying relationships (Reprint ed.). Chichester, England: John Wiley, 5(9), pp. 78-82. Galton, F., 1989. Kinship and Correlation (reprinted 1989). Statistical Science (Institute of Mathematical Statistics), 4(2), pp. 80-86. Kutner, M. H., Nachtsheim, C. J. & Neter , J., 2004. Applied Linear Regression Models. McGraw-Hill/Irwin, Boston, 5(6), pp. 25-39. Lindley, D. V., 1987. Regression and correlation analysis. New Palgrave: A Dictionary of Economics, 4(5), pp. 120-23. Mahdavi , D. & Babak , 2012. The Misleading Value of Measured Correlation. Wilmott, 1(2012), pp. 64-73. Mahdavi , D. B., 2013. The Non-Misleading Value of Inferred Correlation: An Introduction to the Cointelation Model. Wilmott Magazine, 5(2), pp. 34-41. Nikolić, D., Muresan, R. C., Feng, W. & Singer, W., 2012. Scaled correlation analysis: a better way to compute a cross-correlogram. European Journal of Neuroscience, 4(7), pp. 1-21. Rodgers, J. L. & Nicewander, W. A., 1988. Thirteen ways to look at the correlation coefficient. The American Statistician, 42(1), pp. 59-66. Rodgers, J. L. & Nicewander, W. A., 1988. Thirteen ways to look at the correlation coefficient. The American Statistician, 1(42), pp. 59-66. Székely , G. J., Rizzo & Bakirov, N. K., 2007. Measuring and testing independence by correlation of distances. Annals of Statistics, 6(35), p. 2769–2794. Read More

Statistical Dataset Variables Significance - Essay Example

Extract of sample "Statistical Dataset Variables Significance"

CHECK THESE SAMPLES OF Statistical Dataset Variables Significance

Hard Statistics Quiz

Understanding of the Trends of Voting in the British Population

Are the German Banks Riskier than the European Competitors

Spss statistical analysis

Dietary Intake, Gender and Activity Factors Influenced on BMI

SPSS exercises

Positive and Negative Correlation

Post-Traumatic Growth Inventory