Free

Simple Data Analysis and Comparison - Term Paper Example

Add to wishlist

Summary

This term paper "Simple Data Analysis and Comparison" presents data analysis that is crucial in transforming, remodeling, and revising a given data in the view of reaching a certain decision for a given problem or situation (Maindonald and Braun 2010)…

Download full paper File format: .doc, available for editing

GRAB THE BEST PAPER96.4% of users find it useful

Read Text

Subject: Statistics
Type: Term Paper
Level: Undergraduate
Pages: 5 (1250 words)
Downloads: 9

Extract of sample "Simple Data Analysis and Comparison"

Simple Data Analysis & Comparison 1. Introduction Data analysis is crucial in transforming, remodeling and revising a given data in the view of reaching a certain decision for a given problem or situation (Maindonald and Braun 2010). The analysis and comparison of the three data sets, each with 150 counts, provides an insight into various aspects of the data structure which enhances a better understanding of the information presented by the data sets. In this report, the three data sets generated from different probability distributions is analyzed in terms of descriptive measures, variability and distribution, and the information is presented in tables and graphs for easy interpretation. 2. Data Analysis The three data sets generated from various distributions are analyzed with the purpose of revealing the relationships that exist among them. The analyses involved include descriptive statistics, histograms, 5-number summary, normal distribution, box plots, and time series graphs. The analyses will be conducted with the use of Microsoft Excel application software. 2.1. Descriptive statistics To give a summary of the characteristics of the data sets, as well as well as an insight into the aspects of data structure necessary for further analysis, the data was summarized in a descriptive statics table as shown below (Table 1). Our interest is on the means, medians, modes, standard deviations, kurtosis, skewness, range, minimum, maximum and the count. Table 1. Descriptive statistics for the three data sets Data Set 1 Data Set 2 Data Set 3 Mean 13.587926 Mean 10.64549 Mean 10.72431557 Median 13.5235 Median 10.85 Median 7.21565 Mode 13.009 Mode #N/A Mode #N/A Standard Deviation 2.913681287 Standard Deviation 1.370431049 Standard Deviation 12.44222823 Kurtosis 0.00693987 Kurtosis -1.060587733 Kurtosis 19.06739764 Skewness 0.179810096 Skewness -0.200287963 Skewness 3.337320781 Range 15.4061 Range 4.9744 Range 101.842005 Minimum 7.0389 Minimum 8.0166 Minimum 0.087995 Maximum 22.445 Maximum 12.991 Maximum 101.93 Count 150 Count 150 Count 150 2.2. Histograms Hale (1992) defines a histogram as a graphical representation of the frequency or density of a single quantitative measure. The general spread of each of the three data sets are illustrated in separate figures as shown below by first arranging the data into frequency distributions and then constructing the histograms. Figure 1: Histogram for data set 1. Figure 2: Histogram for data set 2. Figure 3: Histogram for data set 3 To give a better comparison between the three data sets, a summary histogram is constructed. This is illustrated in figure 4 below. Figure 4: Combined histograms for the three data sets. 2.3. Empirical rule Considering the shape of the histograms above, it is only data set 1that conforms to the normal distribution. It is therefore believed to follow the empirical rule which states that: for a bell shaped distribution, 68% of the data points fall within the first standard deviation, 95% fall within the second standard deviation, and 99.7% within the third standard deviation. Figure 5. Empirical rule for distribution 1 2.4. 5-number summary A 5-number summary is constructed to give a summary of the information needed to draw box plots for the data sets. These five sample percentiles provide a concise summary on how the data is distributed. Table 2. A table showing the 5-number summary of the three data sets Data Set 1 Data Set 2 Data Set 3 Min 7.0389 8.0166 0.087995 Q1 11.6895 9.54725 2.601525 Median 13.5235 10.85 7.21565 Q3 15.53925 11.76275 14.5385 Max 22.445 12.991 101.93 2.5. Box plots For easy comparison of the distribution of the three data sets, Box plot generator for excel is used to generate a side-by-side triple box plots. This is illustrated in figure 6 below. Figure 6. Box plots for the data sets. 2.6. Normal probability plot The normal probability plots for each of the data sets are constructed in order to assess if the distributions are normally distributed. The figures 7, 8, and 9 that follow are illustrations of the normal probability plots for the given sets of data. Figure 7. Normal probability plot for data set 1. Figure 8. Normal probability plot for data set 2. Figure 9: normal probability plot for data set 3. 2.7. Time series graph Time series graphs are constructed to present the data distributions in a chronological order (Hamilton 1994). A column for time index is first constructed and data points placed alongside. The graphs are then constructed by plotting the data values on the vertical axis and the time index on the horizontal axis. The graphs for data sets 1, 2 and 3 are shown in figures 10, 11, and 12 respectively. Figure 10. Time series for data set 1 Figure 11. Time series for data set 2 Figure 12. Time series for data set 3 3. Results/Discussion The descriptive statistics gives a summary of the important measures of a distribution i.e. central tendency, variance and necessary information for further analysis (Johnson and Bhattacharyya 2009). Data sets 1 and 2 have close means but data set one has a higher mean. Considering variability, the small range of data set 2 indicates that the data points are far much close to one another. Furthermore, the standard deviation is small implying that the data points are close to the mean. According to (McBurney & Theresa 2009), the standard deviation shows the closeness of data points to the mean. A high standard deviation of data set 3 (i.e. 12.44222823) indicates the data points in this set are spread throughout the entire distribution. The closeness of the mean, median and mode of the data set 1 shows that the data originated from a normal distribution. This is supported by the data’s skewness which is almost to zero (i.e. 0.179810096). Testing the empirical rule for data set 1 reveals that it is bell shaped and therefore it can be concluded that the data is from a normal distribution. Through a quick glimpse of the shapes of the histograms, the nature of distribution of the data sets can be seen. For data set 3, it can be noted that the distribution is positively skewed (skewed to the left) and there is one count which seems to be an outlier bringing about a high range. Although slightly skewed to the left, data set 2 has almost its peaks the same and this is the reason for having the kurtosis being -1.060587733. The 5-number summary provides ready information for plotting boxplots which gives a visual presentation of making comparisons about the range (from the maximum and the minimum), the spread (from the quartiles) and the location (from the median) (Black 2009) The normal probability plot for data set 1 actually tells that the data was generated from a normal distribution while the data for distribution 2 and 3 did not. As a chronological presentation of data from time to time, the time series for data set 1 and indicate that they are almost evenly distributed. This trend is, however, not seen in data set 3. 4. Conclusions The information gathered from the data analysis of the three distributions was presented on graphs and tables for a quick and easy interpretation. The report analyzed and compared the three data sets in terms of their means, variability and the nature of distribution of the data points. Descriptive statistics indicated almost similar values of the means for data sets 2 and 3. Data set 1 forms a normal distribution as depicted by the histogram, the bell shape nature as well as its probability plot. High variance was noticed in the distribution of data set 3 which even had a very high range. Generally, data analysis and presentation on graphs and tables provide a quicker and easier way to understand the information put forward by the data distribution that is necessary in decision making. Through data analysis raw data is prepared into information that can be presented in graphs and tables for a quicker and easier interpretation of results. Therefore, it is effective to quickly determine how data is distributed in a given set from graphs or summary tables than it is from the raw data. 5. References Black, K. 2009. Business statistics: Contemporary decision making. New York: John Wiley and Sons. Hale, R.L. 1992. MYSTAT: Statistical applications, Volume 3. Melbourne: Course Technology. Hamilton, J. D. 1994. Time series analysis. UK, Chichester: Princeton University Press. Johnson, R. A. and Bhattacharyya G. K. 2009. Statistics: Principles and methods. New York: John Wiley and Sons. Maindonald, J. and Braun W. J. 2010. Data analysis and graphics using R: An example-based approach. London, Cambridge University Press. McBurney, D., and Theresa White. 2009. Research methods. Mason, OH: Cengage Learning, 2009. Read More

CHECK THESE SAMPLES OF Simple Data Analysis and Comparison

The Burgelman Case Study

Quantitative studies tend to rely on hard data and statistics that can provide generalizable results about a population, whereas qualitative studies could be more of a case example or subjective viewpoint.... The paper “The Burgelman Case Study” focuses on the ICV process, in terms of efforts that are shown to depend on the initiative of the individual company employee, and also, on the ability of middle-level managers to conceptualize the strategic implications of these initiatives....

4 Pages (1000 words) Essay

Descriptive statistics

The use of descriptive statistics also enhances easy comparison across a set of data, e.... It summarises data the way they are without adding or omitting information in order to enhance comparison of original data.... This is demonstrated by King-Shier et al who suggested that descriptive statistics enhances comparison between decisions across cardiac patients concerning the use of chelation therapy.... Descriptive statistics therefore answers psychological questions by enhancing easy comparison between sets of data....

2 Pages (500 words) Literature review

Sampling size and sample size for qualitative research

Therefore, the fact that each stratum is incorporated in the sample and all the variables that might affect the results are taken into consideration makes this strategy to be more effective in comparison with others.... Therefore, this strategy will enable me to come up with a sample that is highly representative, assuming that there will be no cases of missing data which might affect the overall results of the study.... Since the samples are selected using probabilistic methods, stratified sampling method will enable me to make statistical conclusions after analyzing the data that will be collected....

5 Pages (1250 words) Coursework

Business Research Methods

nbsp; The implication here is, therefore, that statistics are invaluable to research data analysis.... nbsp; As was briefly touched upon in the above, there are different types of statistics and this, in itself, stands as a testament to the flexibility of this data analysis tool, insofar as different types mean that the researcher may select the type which best suits his/her research aims, objectives and the requirements of a particular research question.... nbsp; In other words, statistics is a flexible data analysis tool because they provide a strategy both for the derivation of findings which are explicit and for others which are implicit....

4 Pages (1000 words) Essay

Selection of Lead Balls Based on Their Masses from Three Sources

his type of data and analysis can be generated from a measurement of masses of materials used in an engineering project.... This assignment "Selection of Lead Balls Based on Their Masses from Three Sources" presents the masses well distributed and the data for the masses, their means, median, and quartiles have been computed for the data sets to be able to examine whether the masses are comparable for the experiment.... nbsp; The positive value of skewness is an indication that most of the variables (masses of the lead balls) are above the mean side of the data sets, and the tail is more stretched on this side than on the side below the mean....

6 Pages (1500 words) Assignment

Law and Employment Regulations

… 13th June, 2012Question 1Employment is protected by the law and it involves the relationships between the employer, employee and the state/government.... In Australia the legal rights of workers and employers is protected by the Fair Work Act of 2009.... 13th June, 2012Question 1Employment is protected by the law and it involves the relationships between the employer, employee and the state/government....

3 Pages (750 words) Assignment

Employing XML, XQUERY, and SQL Queries Techniques

The comparison in both SQL and XQuery is similar to the < sign is used in both cases.... The comparison in both SQL and XQuery is similar to the < sign is used in both cases.... Here, the SQL command is selecting data from two tables, Orders, and Order details.... Here, the SQL command is selecting data from two tables, Orders, and Order details.... Unit Price <50; XQuery: for $Orders in //Orders let $Order Details := //Order Details [Order ID = $Orders/ID][Unit Price>20][Unit Price<50] return <Orders> {$Orders/(Order ID | Employee ID| Customer ID| Order Date | } {$Order Details/Unit Price} </Orders> Report: Here, the SQL command is selecting data from two tables, Orders, and Order details....

11 Pages (2750 words) Assignment