StudentShare
Contact Us
Sign In / Sign Up for FREE
Search
Go to advanced search...
Free

Statistical Dataset Variables Significance - Essay Example

Cite this document
Summary
The essay "Statistical Dataset Variables Significance" focuses on the statistical analysis of the dataset variables' stylistic significance. The Pearson correlation coefficient is one of the most commonly used measures for determining the strength of the relationship that exists between two variables…
Download full paper File format: .doc, available for editing
GRAB THE BEST PAPER95.3% of users find it useful
Statistical Dataset Variables Significance
Read Text Preview

Extract of sample "Statistical Dataset Variables Significance"

Statistics Essay Calculate the pair-wise correlation coefficients between sales per square meter and each of the other variables and test their statistical significance. Produce scatter plots for each pair of variables. Provide a written interpretation for each of the correlation coefficients and the related scatter plots. SOLUTION The Pearson correlation coefficient is one of the most commonly used measure for determining the strength of relationship that exists between two variables; however the value of the Pearson coefficient does not in any way generally or rather totally indicate the relationship that exists between the two variables (Mahdavi & Babak , 2012). The coefficient values ranges from -1 to +1, any value close to +1 means that there is a strong positive linear correlation while a value of -1 means a perfectly negative linear correlation, a positive value close to zero indicates that there is a weak positive correlation (Székely , et al., 2007). On the other hand, a value close to -1 means that the variables have a strong negative correlation, a negative value close to zero implies a weak negative correlation (Mahdavi , 2013). A coefficient of zero value means that there is no any correlation that exists between the two variables (Nikolić, et al., 2012) Correlations Sales per square metre Number of full-timers Number of part-timers Total number of hours worked Sales floor space of the store (in square metres) Sales per square metre Pearson Correlation 1 .237** .050 .263** -.294** Sig. (2-tailed) .000 .318 .000 .000 N 400 400 400 400 400 Number of full-timers Pearson Correlation .237** 1 .289** .531** .350** Sig. (2-tailed) .000 .000 .000 .000 N 400 400 400 400 400 Number of part-timers Pearson Correlation .050 .289** 1 .249** .366** Sig. (2-tailed) .318 .000 .000 .000 N 400 400 400 400 400 Total number of hours worked Pearson Correlation .263** .531** .249** 1 .576** Sig. (2-tailed) .000 .000 .000 .000 N 400 400 400 400 400 Sales floor space of the store (in square metres) Pearson Correlation -.294** .350** .366** .576** 1 Sig. (2-tailed) .000 .000 .000 .000 N 400 400 400 400 400 **. Correlation is significant at the 0.01 level (2-tailed). Scatterplots Interpretation for each of the correlation coefficients i) Sales per square metre versus number of full-timers The coefficient of correlation is given as 0.237; this represents a weak positive relationship however, the p-value is 0.000 (a value less than α=0.05), meaning that the relationship is significant at 5% significance level. ii) Sales per square metre versus number of part-timers The coefficient of correlation is given as 0.05; this represents a weak positive relationship however, the p-value is 0.318 (a value greater than α=0.05), meaning that the relationship is not significant at 5% significance level. iii) Sales per square metre versus number of hours worked The coefficient of correlation is given as 0.263; this represents a weak positive relationship however, the p-value is 0.000 (a value less than α=0.05), meaning that the relationship is significant at 5% significance level. iv) Sales per square metre versus Sales floor space of the store The coefficient of correlation is given as -0.294; this represents a weak negative relationship however, the p-value is 0.000 (a value less than α=0.05), meaning that the relationship is significant at 5% significance level. v) Number of full-timers versus number of part-timers The coefficient of correlation is given as 0.289; this represents a weak positive relationship however, the p-value is 0.000 (a value less than α=0.05), meaning that the relationship is statistically significant at 5% significance level. vi) Number of full-timers versus total number of hours worked The coefficient of correlation is given as 0.531; this represents a moderately strong positive relationship however, the p-value is 0.000 (a value less than α=0.05), meaning that the relationship is statistically significant at 5% significance level. vii) Number of full-timers versus Sales floor space of the store (in square metres) The coefficient of correlation is given as 0.350; this represents a weak positive relationship however, the p-value is 0.000 (a value less than α=0.05), meaning that the relationship is statistically significant at 5% significance level. viii) Number of part-timers versus total number of hours worked The coefficient of correlation is given as 0.249; this represents a weak positive relationship however, the p-value is 0.000 (a value less than α=0.05), meaning that the relationship is statistically significant at 5% significance level. ix) Number of part-timers versus Sales floor space of the store (in square metres) The coefficient of correlation is given as 0.366; this represents a weak positive relationship however, the p-value is 0.000 (a value less than α=0.05), meaning that the relationship is statistically significant at 5% significance level. x) Total number of hours worked versus Sales floor space of the store (in square metres) The coefficient of correlation is given as 0.576; this represents a moderately strong positive relationship however, the p-value is 0.000 (a value less than α=0.05), meaning that the relationship is statistically significant at 5% significance level. 2. Write down an equation representing a linear regression model in which sales per square metre depend on a constant, the total number of hours worked and floor space of the store (in square metres). Estimate the equation, report the results, and comment on the overall goodness of the model. SOLUTION In this case we run an OLS regression analysis. This is a statistical process that involves the modelling and estimation of any relationship (either positive or negative) among various variables (Kutner, et al., 2004); this may include various techniques that models and analyses two or more variables with a focus on the relationship between the response variable and one or more explanatory variables (Lindley, 1987). It very crucial to check the goodness of fit for the estimated model and the statistical significance of the estimated parameters in the model once we have constructed the regression model (Fotheringham, et al., 2002). We checked the statistical significance by using F-test for the overall fit; we then followed it by t-tests for the individual parameters (Galton, 1989). Estimated of the equation: Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate 1 .605a .366 .363 2985.371 a. Predictors: (Constant), Sales floor space of the store (in square metres) , Total number of hours worked Based on the results, the equation is given as; ANOVAb Model Sum of Squares df Mean Square F Sig. 1 Regression 2.041E9 2 1.020E9 114.495 .000a Residual 3.538E9 397 8912441.334 Total 5.579E9 399 a. Predictors: (Constant), Sales floor space of the store (in square metres) , Total number of hours worked b. Dependent Variable: Sales per square metre The Adjusted R square value is 0.363. The regression is significant at 5% significance level since we observe the p-value to be 0.000 (a value less than α=0.05) leading to rejection of the null hypothesis and concluding that indeed the regression is significant at 5% significance level and that the overall model is fit and appropriate. R-square is 0.366 implying that 36.6% of variation in the dependent variable (sales) is explained by the two independent variables in the model. 3. Interpret the estimated coefficients from an economic perspective and comment on their statistical significance. SOLUTION Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) 5133.590 321.693 15.958 .000 Total number of hours worked 37.528 2.837 .647 13.227 .000 Sales floor space of the store (in square metres) -22.145 1.625 -.666 -13.627 .000 a. Dependent Variable: Sales per square metre The coefficient of number of hours (hoursw) is 37.528; this indicates that for any unit change in hoursw, the dependent variable (sales) changes by 37.528. That is to say, if hoursw level increases by one unit then we would expect the sales to increase by 0.719 and vice versa. The coefficient for ssize is -22.145 implying that for any unit change (increase) in the ssize, the dependent variable (sales) decreases by 22.145. 4. Does the inclusion of the number of full-timers and part-timers significantly improve the model? SOLUTION The value of adjusted R-squared when the number of full-timers and part-timers are included in the model is 0.396 while the same value of R-squared when the number of full-timers and part-timers are not included is 0.363; this shows that inclusion number of full-timers and part-timers increases the value of adjusted R-squared. It is therefore clear that inclusion of the number of full-timers and part-timers significantly improves the model. Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate 1 .634a .402 .396 2905.818 a. Predictors: (Constant), Number of part-timers, Total number of hours worked, Number of full-timers, Sales floor space of the store (in square metres) Conclusions In this paper we report on the correlations existing between the dependent variable (sales) and the explanatory variables. We also report on the multiple regressions analysis. From the results, it clear that sales have relationship with the four explanatory variables given, however out of the four variables three had statistically significant correlation with the dependent variable. Also out of the three variables with statistically significant correlation, two had positive linear relationship while one explanatory variable had negative linear relationship with the dependent variable. In regard to regression, the analysis shows that the explanatory variables in the model do not fully explain the variation in the dependent variable; only 36.6% of variation in the dependent variable is explained by the explanatory variables in the model. This shows that a large chunk of the variation is explained by the variables not included in the model (error term). Lastly, we observe that including the number of full-timers and part-timers slightly improves the model. For instance, without the number of full-timers and part-timers in the model we have adjusted R-Squared as 0.363 but when we include the two explanatory variables (number of full-timers and part-timers), the value of adjusted R-Squared increases to 0.396. This shows that two relevant variables had been excluded in the model. It is therefore prudent that researchers should critically evaluate the variables affecting the dependent variable in detail before settling for a model. Works Cited Fotheringham, A. S., Brunsdon, C. & Charlton, M., 2002. Geographically weighted regression: the analysis of spatially varying relationships (Reprint ed.). Chichester, England: John Wiley, 5(9), pp. 78-82. Galton, F., 1989. Kinship and Correlation (reprinted 1989). Statistical Science (Institute of Mathematical Statistics), 4(2), pp. 80-86. Kutner, M. H., Nachtsheim, C. J. & Neter , J., 2004. Applied Linear Regression Models. McGraw-Hill/Irwin, Boston, 5(6), pp. 25-39. Lindley, D. V., 1987. Regression and correlation analysis. New Palgrave: A Dictionary of Economics, 4(5), pp. 120-23. Mahdavi , D. & Babak , 2012. The Misleading Value of Measured Correlation. Wilmott, 1(2012), pp. 64-73. Mahdavi , D. B., 2013. The Non-Misleading Value of Inferred Correlation: An Introduction to the Cointelation Model. Wilmott Magazine, 5(2), pp. 34-41. Nikolić, D., Muresan, R. C., Feng, W. & Singer, W., 2012. Scaled correlation analysis: a better way to compute a cross-correlogram. European Journal of Neuroscience, 4(7), pp. 1-21. Rodgers, J. L. & Nicewander, W. A., 1988. Thirteen ways to look at the correlation coefficient. The American Statistician, 42(1), pp. 59-66. Rodgers, J. L. & Nicewander, W. A., 1988. Thirteen ways to look at the correlation coefficient. The American Statistician, 1(42), pp. 59-66. Székely , G. J., Rizzo & Bakirov, N. K., 2007. Measuring and testing independence by correlation of distances. Annals of Statistics, 6(35), p. 2769–2794. Read More
Cite this document
  • APA
  • MLA
  • CHICAGO
(Statistical Dataset Variables Significance Essay Example | Topics and Well Written Essays - 1500 words - 1, n.d.)
Statistical Dataset Variables Significance Essay Example | Topics and Well Written Essays - 1500 words - 1. https://studentshare.org/statistics/1866470-the-dataset-available-on-study-direct-contains-annual-sales-data-and-other-characteristics-of-400-dutch-fashion-stores-in-1990
(Statistical Dataset Variables Significance Essay Example | Topics and Well Written Essays - 1500 Words - 1)
Statistical Dataset Variables Significance Essay Example | Topics and Well Written Essays - 1500 Words - 1. https://studentshare.org/statistics/1866470-the-dataset-available-on-study-direct-contains-annual-sales-data-and-other-characteristics-of-400-dutch-fashion-stores-in-1990.
“Statistical Dataset Variables Significance Essay Example | Topics and Well Written Essays - 1500 Words - 1”. https://studentshare.org/statistics/1866470-the-dataset-available-on-study-direct-contains-annual-sales-data-and-other-characteristics-of-400-dutch-fashion-stores-in-1990.
  • Cited: 0 times

CHECK THESE SAMPLES OF Statistical Dataset Variables Significance

Hard Statistics Quiz

Quantitative variables can be discrete or continuous.... To find your personal dataset number for the questions in section B, enter Blackboard and click on MAR8001.... Find your name in the list and make a note of your dataset number.... hellip; Section C is a critical appraisal of a technical paper (chosen by you) which has some statistical analysis in it, and is worth 20 marks....
18 Pages (4500 words) Essay

Understanding of the Trends of Voting in the British Population

The hypothesis to be tested is the level of dependence of the variable vote on the independent variables gender, age, level of interest in elections, bases to pick a political party, level of trust in British politicians and opinion regarding the effectiveness of voting.... The questionnaire is to include all of the independent and dependent variables.... The Null Hypothesis (Ho): the vote concerning the chosen political party depends on the variables gender, age, level of interest in elections, bases to pick a political party, level of trust in British politicians and opinion regarding the effectiveness of voting....
7 Pages (1750 words) Statistics Project

Are the German Banks Riskier than the European Competitors

The objective is to find out the relative risk quotient of the variables corresponding to the German and other European banks.... Use of statistical software STATA has been made to draw a comparative analysis.... The present paper tries to explore the fact that the German banks have in the process been successful in implementing anti-risk measures in their policies in comparison to their European peers....
7 Pages (1750 words) Essay

Spss statistical analysis

t is clear… Age is also positively correlated with both pressures, however, the statistical significance is quite low p=0.... This dataset contains two possible sport disciplines represented in this dataset.... There are many ways to perform analysis of sport related statistical data (Williams & Wragg, 2004)....
10 Pages (2500 words) Term Paper

Dietary Intake, Gender and Activity Factors Influenced on BMI

Although these designs are valuable in determining associations between variables, experimental designs that entail the manipulation of sedentary behaviors are important in determining the causal impact of sedentary habits on energy consumption (Robinson, 1999).... This study involves a variety of tests, which is aimed at deciding the outcome of three hypotheses including females have a greater percentage body fat compared with males; energy and fat intake is strongly related to body fat, and sedentary subjects are more likely to be overweight compared with active subjects… The author found enough evidence to prove that  Females have a greater percentage body fat compared with males....
11 Pages (2750 words) Essay

SPSS exercises

The interaction between the two variables gender and disability, F(2,54) = .... 24 indicating that the two variables had a significant interaction.... Anderson, 2011).... Thus p = .... 56 an indication that null hypothesis is accepted; hence, hair color has a relationship with extroversion. ...
4 Pages (1000 words) Assignment

Positive and Negative Correlation

Any kind of common sense “relationship” cannot be proved to be correct until and unless it is seconded by collected data and a particular statistical technique is applied to get an accept or reject conclusion.... The aim of this research paper is to identify where there could exist a regression model i....
6 Pages (1500 words) Article

Post-Traumatic Growth Inventory

5 level of significance (Andrew, Pedersen, & McEvoy, 2011).... The following paper under the title 'Post-Traumatic Growth Inventory' gives detailed information about the data file named “911 and depression” The dataset is a part of the study conducted after the 9/11 event on a number of variables related to post-traumatic stress disorder.... The data file has 137 variables, including demographic questionnaire variables and participant's responses to each of the items across the six surveys....
5 Pages (1250 words) Statistics Project
sponsored ads
We use cookies to create the best experience for you. Keep on browsing if you are OK with that, or find out how to manage cookies.
Contact Us