Quantitative Data Analysis Assignment Example | Topics and Well Written Essays

QUANTITATIVE DATA ANALYSIS ASSIGNMENT Part One In this part we provide a graph representing JOBINC and interpret the information the graph conveys we also comment on the summary statistics describing the main features and distribution characteristics of JOBINC and lastly, using JOBINC we carry out significance tests on two and more than two population means. a) Graph (Box plot) The graph below represents a box plot representing the variable JOBINC. Box plots are used to describe the overall distribution of response for a group. They provide a useful way to visualise the range and other characteristics of responses for a large group. The graph shows a number of outliers in the data set for the variable JOBINC b) Summary statistics The table 1 below represents the summary (descriptive) statistics for the variable JOBINC. The minimum value is given as 300 while the maximum is given as 4000; this gives a very large range of 3700. Clearly this shows how disperse the data are and most likely the data points include outliers which might probably exaggerate the range. The mean applicant’s monthly income from current job (JOBINC) is further given as 994.89. The variance of the data is seen to be very large (356704), further showing how dispersed the values are from the mean. The value of skewness is given as 2.085 (a value greater than zero); it is thus clear that we have a right skewed distribution (that is, most values are concentrated on left of the mean, with extreme values to the right). Lastly, the value of kurtosis is given as 6.175 (a value greater than 3); implying that the data follows a leptokurtic distribution, sharper than a normal distribution, with values concentrated around the mean and thicker tails. This means high probability for extreme values. Table 1: Descriptive statistics N 106 Range 3700 Minimum 300 Maximum 4000 Mean 994.8868 Std. Deviation 597.247 Variance 356704 Skewness: Statistic 2.084952 Std. Error 0.23464 Kurtosis: Statistic 6.174709 Std. Error 0.465 It would therefore advisable to exclude the outliers from the data, this will help solve the issue of normality too. c) Significance Tests i) Based on two population means In this section, we aim to find out whether there is any significant difference in the monthly income from the current job for the male participants and that of the female participants. The null hypothesis is as follows; H0: There is no significant difference in male and female monthly income H1: There is significant difference in male and female monthly income Table 2: Group Statistics sex N Mean Std. Deviation Std. Error Mean jobinc 1 91 1071.9 609.87526 63.93230 0 15 527.87 109.45245 28.26050 From the above table (table 2), we observe the mean applicant’s monthly income from current job (JOBINC) for the male participants to be 1071.9 while that of the female participants to be 527.87. Table 3: Independent Samples Test Levenes Test for Equality of Variances t-test for Equality of Means F Sig. t df Sig. (2-tailed) Mean Difference Std. Error Difference 95% Confidence Interval of the Difference Lower Upper jobinc Equal variances assumed 10.47 .002 3.432 104 .001 544 154.5 229.7 858.3 Equal variances not assumed 7.783 103.26 .000 544 69.9 405.4 682.6 In the independent samples test table (table 3) presented above, assuming equal variances, used to test whether the two samples have equal variances; what we refer to homogeneity of variance, the p-value is given as 0.001 (a value less than α=0.05), we thus reject the null hypothesis and conclude that there is indeed a significant difference in the monthly income from the current job for the male participants and that of the female participants. Using the F-test we test for equal variances and the computed p-value associated with the F-value is 0.002 a value less than the α=0.05 we thus reject the null hypothesis and conclude that the data have unequal variance. ii) Based on more than two population means. In this section, we aim to find out whether there is any Significant difference (through analysis of variance) in the monthly income from the current job for the different job status (that is, management, supervisory and other). The null hypothesis is as follows; JOBINC Table 4: Descriptive Statistics N Mean Std. Deviation Std. Error 95% Confidence Interval for Mean Lower Bound Upper Bound 1 17 2067.8 652.7 158.3 1732.2 2403.3 2 31 1101.1 214.9 38.6 1022.3 1180.0 3 58 623.6 153.3 20.1 583.3 664.0 Total 106 994.9 597.2 58.0 879.9 1109.9 Model Fixed Effects 304.4 29.6 936.3 1053.5 Random Effects 427.9 -846.4 2836.1 In the above table (table 4), we clearly see that the means and variances of the in the monthly income from the current job for the different job status (that is, management, supervisory and other). The mean and standard deviation income for the management employees is 2067.8 and 652.7 respectively, the mean and standard deviation income for the supervisory employees is 1101.1 and 214.9 respectively and lastly, the mean and standard deviation income for the other employees is 623.6 and 153.3 respectively. Table 5: ANOVA jobinc Sum of Squares df Mean Square F Sig. Between Groups 2.791E7 2 1.396E7 150.646 .000 Within Groups 9541975.939 103 92640.543 Total 3.745E7 105 In the ANOVA table (table 5) presented above, we see the p-value to be given as 0.000 (a value less than α=0.05) leading to the rejection of the null hypothesis and making us adopt the alternative hypothesis that there is significant variation in the monthly income for the different job status. Post hoc test (LSD) To identify the significant differences among individual samples, we used LSD post hoc test. The test was chosen due to its ease of implementation. The results of the post hoc test are presented in table 6 below. Table 6: Multiple Comparisons Dependent Variable:jobinc (I) jobstat (J) jobstat Mean Difference (I-J) Std. Error Sig. 95% Confidence Interval Lower Bound Upper Bound LSD 1 2 966.63567* 91.85776 .000 784.4575 1148.8139 3 1444.12677* 83.94459 .000 1277.6425 1610.6111 2 1 -966.63567* 91.85776 .000 -1148.8139 -784.4575 3 477.49110* 67.71747 .000 343.1895 611.7927 3 1 -1444.12677* 83.94459 .000 -1610.6111 -1277.6425 2 -477.49110* 67.71747 .000 -611.7927 -343.1895 *. The mean difference is significant at the 0.05 level. In the above table (table 6), we run a post-hoc analysis to find out which particular subjects are different from each other using LSD. The table clearly shows that all the subjects are different from each other. For instance the p-value relating to management employees and supervisory employees is 0.000 (a value less than α=0.05) leading to the rejection of the null hypothesis and making us adopt the alternative hypothesis that there is significant variation in the monthly income for the management employees and supervisory employees. Similar findings are observed for the management employees and other employees and supervisory employees and other employees. Part Two In this part, we develop the “best” model to explain/predict JOBINC. We also State fully the assumptions underlying the estimation procedures used and we identify any estimation problems that arise when these assumptions break down. The model to be used is the Ordinary Least Squares (OLS) regression model. The assumptions of the model are as follows; i) Linearity of the regression model; Linearity refers to the manner in which the parameters and the disturbance enter the equation. ii) Full Rank; There are no exact linear relationships among the variables iii) Spherical Disturbances; This concerns the variances and covariances of the disturbances. iv) Exogenously generated data; X may be fixed or random, but it is generated by a mechanism that is unrelated to U. v) Normality; It is convenient to assume that the disturbances are normally distributed with zero mean and constant variance. The model is given as follows; Where represents the coefficient of the intercept, is the coefficient for the variable sex, is the coefficient for the variable age, is the coefficient for the variable jobyrs, is the coefficient for the variable jobstat, Jobstat is categorical – shouldn’t it, and any other categorical variables, be entered as dummy variables? is the coefficient for the variable educ and lastly is the coefficient for the variable mstatus Table 7: Correlations jobinc sex age jobyrs jobstat educ mstatus Pearson Correlation jobinc 1.000 .319 .424 .277 -.845 .847 .245 sex .319 1.000 -.006 .055 -.333 .287 .254 age .424 -.006 1.000 .690 -.392 .362 .222 jobyrs .277 .055 .690 1.000 -.237 .223 .137 jobstat -.845 -.333 -.392 -.237 1.000 -.811 -.296 educ .847 .287 .362 .223 -.811 1.000 .203 mstatus .245 .254 .222 .137 -.296 .203 1.000 Sig. (1-tailed) jobinc . .000 .000 .002 .000 .000 .006 sex .000 . .474 .288 .000 .001 .004 age .000 .474 . .000 .000 .000 .011 jobyrs .002 .288 .000 . .007 .011 .080 jobstat .000 .000 .000 .007 . .000 .001 educ .000 .001 .000 .011 .000 . .018 mstatus .006 .004 .011 .080 .001 .018 . Looking at the correlations matrix table (table 7) above, we observe that there are significant correlations that exist between the variables. In fact, the variable JOBINC has significant correlation with all the independent (explanatory) variables in the model. Table 8: Model Summaryb Model R R Square Adjusted R Square Std. Error of the Estimate Durbin-Watson 1 .894a .798 .786 276.13717 1.889 a. Predictors: (Constant), mstatus, jobyrs, sex, educ, age, jobstat b. Dependent Variable: jobinc In the model summary table (table 8) above, we are given the value of R-square to be 0.798; implying that 79.8% of the variation in the dependent variable (JOBINC) is explained by the independent variables in the model Table 9: ANOVAb Model Sum of Squares df Mean Square F Sig. 1 Regression 2.990E7 6 4984165.789 65.365 .000a Residual 7548921.909 99 76251.736 Total 3.745E7 105 a. Predictors: (Constant), mstatus, jobyrs, sex, educ, age, jobstat b. Dependent Variable: jobinc In the ANOVA table (table 9) above, explains the appropriateness of the model, that is, how fit the model is in predicting the dependent variable. The p-value is given as 0.000 (a value less than α=0.05); we thus reject the null hypothesis (of unfit model) and conclude that the model is indeed fit/appropriate to predict the dependent variable (JOBINC) Table 10: Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. Collinearity Statistics B Std. Error Beta Tolerance VIF 1 (Constant) 658.266 333.458 1.974 .051 sex 82.225 84.960 .048 .968 .335 .820 1.219 age 4.318 3.490 .084 1.237 .219 .441 2.268 jobyrs 1.360 5.883 .015 .231 .818 .516 1.939 jobstat -337.764 64.771 -.424 -5.215 .000 .307 3.254 educ 89.518 15.255 .456 5.868 .000 .337 2.971 mstatus -8.249 58.610 -.007 -.141 .888 .857 1.167 a. Dependent Variable: jobinc Lastly, we look at the coefficients given in table 10 above. The coefficient for the intercept is 658.3, implying that holding all other factors constant we would expect the value of JOBINC to be 658.3. The coefficient for sex is given as 82.2; this implies that moving from 0 (female) to 1 (male) would result to an increase in the income by 82.2. The coefficient for age is 4.318, implying that for any unit increase in age, we would expect the current monthly income to increase by 4.318. The coefficient for jobyrs is 1.36, this shows that for any increase in the number of years that one has worked increases the monthly income by 1.36. Job status has a coefficient of -337.8; this clearly shows that moving from one unit to the other (in an increasing order) reduces the monthly income by 337.8. For instance, moving from 1 (managerial position) to 2 (supervisory position) results to a decrease in the monthly income by a factor of 337.8. Educ has a coefficient of 89.5; this means that a unit increase in the number of years in school is expected to increase the monthly income by 89.5. Lastly, the coefficient of the marital status (mstatus) is -8.249; this clearly shows that moving from one unit to the other (in an increasing order) reduces the monthly income by 8.249. For instance, moving from 0 (not being married) to 1 (married) results to a decrease in the monthly income by 8.249, not a factor of 8.249. Based on the above analysis, the model is thus given as; Estimation problems that arise when the OLS assumptions are violated include; 1. Violating the assumption of linearity results to biased results of the coefficients 2. If we violate the OLS assumptions, the OLS estimators and regression predictions based on them remains unbiased and consistent. 3. If we violate the OLS assumptions, the OLS estimators are no longer the BLUE (Best Linear Unbiased Estimators) because they are no longer efficient, so the regression predictions will be inefficient too. 4. Because of the inconsistency of the covariance matrix of the estimated regression coefficients, the tests of hypotheses, (t-test, F-test) are no longer valid. References Amemiya, Takeshi (1985). Advanced econometrics. Harvard University Press. ISBN 0-674-00560-0. Davidson, Russell; Mackinnon, James G. (1993). Estimation and inference in econometrics. Oxford University Press. ISBN 978-0-19-506011-9. Greene, William H. (2002). Econometric analysis (5th ed.). New Jersey: Prentice Hall. ISBN 0-13-066189-9. Retrieved 2015-02-14. Read More

Quantitative Data Analysis - Assignment Example

Extract of sample "Quantitative Data Analysis"

CHECK THESE SAMPLES OF Quantitative Data Analysis

Performance of High School vs Middle School Teachers

Cognitive Therapy Group for Cardiac Rehabilitation

Self-Report Measure of a Student

SUMMARY WORKSHOP-RESEARCH METHOD CLASS

(Ethical considerations and Data analysis) which are two parts of my full proposal

Reseach Method

Research Methods and Statistics

Steps in the Research Process