Understanding and Exploring Assumptions Assignment Example | Topics and Well Written Essays

Activity 4 Why do we care whether the assumptions required for statistical tests are met? It is very important that the assumptions for statistical tests are met because the appropriate interpretation of the statistics would depend on whether the applicable assumptions are met. For instance, for inferential statistics, it is usually important that the sample is a random sample so that the sample characteristics would have valid information on population characteristics. Further, some statistics that are used to compare samples require that we explicitly identify what assumption would be likely applicable, or whether we should assume that variance of the two samples are equal or unequal. The assumption of normality or non-normality of distribution will also inform us the extent to which we can use a normal distribution or a distribution like the chi-square distribution to make inferences from data. 2. Open the data set that you corrected in activity #3 for DownloadFestival.sav. You will use the following variables: Day1, Day2, and Day3 (hygiene variable for all three days). Create a simple histogram for each variable. Choose to display the normal curve (under Element Properties) and title your charts. Copy these plots into your Activity #4 Word document. Histogram of Variables Day1, Day2, and Day3 3. Now create probability-probability (p-p) plots for each variable. This output will give you additional information. Read over the Case Processing Summary. Notice that there is missing data for Days 2 and Day 3? Copy only the Normal p-p Plots into your Activity #4 Word document (you do not need to copy the beginning output nor the Detrended Normal p-p Plots). 4. Examining the histograms and p-p plots describe the dataset, with particular attention toward the assumption of normality. For each day, do you think the responses are reasonably normally distributed? (Just give your impression of the data.) Why or why not? Based on the p-p plot for each variable, using “normal” as the test distribution, variables “day 1”, “day 2”, and “day 3” are most likely normally distributed. They are most likely normally distributed or that the assumption of normal distribution is justified because according to the SPSS help facility, “if the selected variable matches the test distribution, the points cluster around a straight line.” In the SPSS output I have produced for item 3, the test distribution I selected is the “normal distribution”. However, I must point out that based on the SPSS outputs I have produced for item 2, while variable “day 1” appears to be normally distributed, variables “day2” and “day2” are positively skewed distribution. 5. Using the same dataset, and the Frequency command, calculate the standard descriptive measures (mean, median, mode, standard deviation, variance and range) as well as kurtosis and skew for all three hygiene variables. Paste your output into your Activity #4 Word document (you do not need to paste the Frequency Table). What does the output tell you? You will need to comment on: sample size, measures of central tendency and dispersion and well as kurtosis and skewness. You will need to either calculate z scores for skewness and kurtosis or use those given in the book to provide a complete answer. Bottom line: is the assumption of normality met for these three variables? Does this match your visual observations from question #2? Hygiene (Day 1 of Download Festival) Hygiene (Day 2 of Download Festival) Hygiene (Day 3 of Download Festival) N Valid 810 264 123 Missing 0 546 687 Mean 1.7934 .9609 .9765 Median 1.7900 .7900 .7600 Mode 2.00 .23 .44(a) Std. Deviation .94449 .72078 .71028 Variance .892 .520 .504 Skewness 8.865 1.095 1.033 Std. Error of Skewness .086 .150 .218 Kurtosis 170.450 .822 .732 Std. Error of Kurtosis .172 .299 .433 Range 20.00 3.44 3.39 a Multiple modes exist. The smallest value is shown All samples are large samples, implying that we can use the assumption of normality for all distributions (Walpole et al., 2007, p. 182-185, 245-247). The relative closeness of the values of mean, median, and mode for “day1” and “day3” suggest a symmetric or normal distribution. Based on the measures of the skew, “day1” is more positively skewed compared to “day2” and “day3”. The variance values among the three variables suggest that they are relatively homogenous because the values are close to zero. Among the three variables, “day1” has the greatest excess kurtosis based on Gujarati (1995, p. 143). The assumptions of normality are not basically met for all three variables because their skew and kurtosis values are generally twice their standard errors (except for variable “day3” in which kurtosis is less than twice the standard error). Nevertheless, the conventional statistical theory says we can validly assume that they are normally distributed with regard to the computation of the mean (Walpole et al., 2007, p. 182-185, 245-247). The figures for skew and kurtosis provided by the table do not match my interpretation of the graphics I obtained in item 2. 6. Using the dataset SPSSExam.sav, and the Frequency command, calculate: the standard descriptive statistics (mean, median, mode, standard deviation, variance and range) plus skew and kurtosis, and histograms with the normal curve on the following variables: Computer, Exam, Lecture, and Numeracy for the entire dataset. Complete the same analysis using University as a grouping variable. Paste your output into your Activity #4 Word document (you do not need to paste the Frequency Table). What do the results tell you with regard to whether the data is normally distributed? Percentage on SPSS exam Computer literacy Numeracy Percentage of lectures attended N Valid 100 100 100 100 Missing 0 0 0 0 Mean 58.10 50.71 4.85 59.765 Median 60.00 51.50 4.00 62.000 Mode 72(a) 54 4 48.5(a) Variance 454.354 68.228 7.321 470.230 Range 84 46 13 92.0 a Multiple modes exist. The smallest value is shown Grouping Variable: University Percentage on SPSS exam Computer literacy Numeracy Computer literacy University Duncetown University Mean 40 50 4 50 Mode 34 48 4 48 Median 38 49 4 49 Maximum 66 67 9 67 Minimum 15 35 1 35 Standard Deviation 13 8 2 8 Variance 158 65 4 65 Sussex University Mean 76 51 6 51 Mode 72 54 5 54 Median 75 54 5 54 Maximum 99 73 14 73 Minimum 56 27 1 27 Standard Deviation 10 9 3 9 Variance 104 72 9 72 7. Using the dataset SPSSExam.sav, determine whether the scores on computer literacy and percentage of lectures attended (with University as a grouping variable) meet the assumption of homogeneity of variance (use Levene test). You must remember to unclick the split file option used above before doing this test. What does the output tell you? (Be as specific as possible.) Test of Homogeneity of Variances Levene Statistic df1 df2 Sig. Percentage on SPSS exam 2.584 1 98 .111 Computer literacy .064 1 98 .801 Percentage of lectures attended 1.731 1 98 .191 Numeracy 7.368 1 98 .008 The null hypothesis in the Levene test for homogeneity is that there is homogeneity of variance. As indicated by the table immediately above, we are unable to reject the null hypothesis for all variables except for numeracy. 8. Describe the assumptions of normality and homogeneity of variance. When these assumptions are violated, what are your options? Are there cases in which the assumptions may technically be violated, yet have no impact on your intended analyses? Explain. When the null hypothesis of homogeneity of variance can be rejected then the option is to test hypotheses based on the separate variance t-test (Kinnear and Grav, 2008, p. 200). Bibliography Kinnear, P. & Gray, C. (2008). SPSS 15 made simple. Hove and New York: Taylor and Francis Group. SPSS, Inc. (2005). SPSS Version 15. A statistical software with tutorial. Surrey, UK: SPSS UK Limited. Walpole, R., Myers, R., Myers, S., and Ye, Keying (2007). Probability & Statistics for Engineers & Scientists. 8th ed. New Jersey: Pearson Educational International. Read More

Understanding and Exploring Assumptions - Assignment Example

Extract of sample "Understanding and Exploring Assumptions"