Simple Data Analysis Investigation Report Example | Topics and Well Written Essays

Your name Instructor’s name Class name Date assignment due Simple Data Analysis Report 1. Introduction This report uses descriptive statistics to analyze and compare three data sets generated from different probability distribution each; Distribution 1, Distribution 2, and Distribution 3. For the purposes of this report it is assumed that the data analysis will provide a quick and easy way of understanding the information presented in the graphs and tables than it is from the raw data. However, comparison of the three data is limited by a greater range disparity in distribution which makes it difficult to present all the three distributions in one graph This discussion presents a data analysis of the three distributions using descriptive measures of variations, central tendency, standard deviation, kurtosis and skewness in revealing the relationship between the three data sets in engineering discipline. Variability and the distribution of the data sets are also analyzed. 2. Project Work- Data Analysis To enable a better data comparison between the three random sets of data taken from different probability distributions in the engineering sector, various analyses were conducted as illustrated below. 2.1. Descriptive statistics To describe the characteristics of each of the distributions, and the relationship among variables in these distributions, the data is presented in a descriptive statistics table as shown below. Table 1. Joint descriptive statistics for the three distributions Distribution 1 Distribution 2 Distribution 3 Mean 0.69136601 Mean 47.41486305 Mean 0.755006874 Median 0.706262121 Median 46.35399952 Median 0.901929762 Mode #N/A Mode #N/A Mode #N/A Standard Deviation 0.096373963 Standard Deviation 28.45418991 Standard Deviation 0.280616036 Kurtosis -1.160620685 Kurtosis -1.07227086 Kurtosis 0.106719043 Skewness -0.257885989 Skewness 0.101666661 Skewness -1.217514961 Range 0.338817888 Range 99.51351836 Range 0.962505455 Minimum 0.501437529 Minimum 0.431081214 Minimum 0.024923257 Maximum 0.840255417 Maximum 99.94459957 Maximum 0.987428712 Count 150 Count 150 Count 150 2.2. Histograms The general spread of each of the distributions can be illustrated by arranging the data sets into frequency distributions and then drawing histograms of each distribution. According to Harrison & Tamaschke (91), histograms are graphical representations that allow individuals to comprehend the shape of a given frequency as a glance and it therefore provides quick and easy ways to determine the distribution of each data set. The graphs below present the histogram of each individual data distribution. Figure 1: Histogram for distribution 1. Figure 2: Histogram for distribution 2. Figure 3: Histogram for distribution 3. An enhanced summary histogram below gives a comparison in the distribution of the three data sets. Since there is a big disparity in the numerical range between Distribution 2 and the rest, the values of this distribution are scaled down by 100 (i.e. a bin range of 100 is represented by 1). Figure 4: Enhanced Histogram for the three data distributions. 2.3. Empirical rule From the three data sets being analyzed, it is only Distribution 1 that seems to follow a normal distribution as shown in figure 1 above. Therefore, it is only Distribution 1 that will conform to what the empirical rule implies. Figure 5 below shows that Distribution 1 follows a bell shaped distribution. Figure 5. Emperical rule for distribution 1 Since the distribution assumes a bell shape, then the data is from a normal distribution and thus conforms to the empirical rule that states that 68% of the data points fall within ± 1 std. dev.; 95% within ± 2 std. dev.; and 99.7% within ± 3 std. dev (Black 61). 2.4. 5-number summary The 5- number summary presented in table 2 gives a concise summary of the distribution of the data. Table 2 below gives a summary of the five crucial sample percentiles for the given data sets. Table 2. A table showing the 5-number summary of the three distributions 2.5. Box plots In comparing the three groups of distribution, box plots for all the distributions are drawn on the same scale as illustrated in figure 6 below. Figure 6. Box plots for the distributions As seen from the box plots above, it is difficult to make comparisons between distribution 1 and 3 since their ranges are small unlike distribution 2 with a big range disparity. Therefore, a specific comparison between the two distributions is done by plotting side-by-side box plots for distribution 1 and 2. This is illustrated in figure 7 below. Figure 7. side-by-side box plots for distribution 1 and distribution 2 2.6. Normal probability plot To assess whether or not the distributions are approximately normally distributed, normal probability plots for the data are plotted. Figure 8, 9, and 10 below shows the normal probability plot for Distribution 1, 2, and 3 respectively. Figure 8: normal probability plot for distribution 1. Figure 9: normal probability plot for distribution 2. Figure 10: normal probability plot for distribution 3. 2.7. Time series graph To show the data measurements in a chronological order, time-series graphs are constructed for each of the distributions. A column of consecutive number presenting the time index are created and then put on a horizontal scale while the data sets are plotted on the vertical scale. Figures 11, 12, and 13 below show the time-series graphs for distribution 1, 2 and 3 respectively. Figure 11. Time series for distribution 1 Figure 12. Time series for distribution 2 Figure 13. Time series for distribution 3 3. Results/Discussion From the descriptive statistics carried on the data distributions, the means of the three distributions vary from one another despite the fact that the values for the means of distribution 1 and 3 seem to be close to one another. For distribution 2, the mean is even much greater than the two. According to (Brase & Pellillo 83), variation in data sets is determined through the use of various measures of variation. These measures determine the range of the distributions in relation to the measures of central tendencies. The range in distribution 1 is smaller that the range in distribution 3, which is also far much smaller than in distribution 2. This means that the difference from the minimum measure and the highest measure in distribution 2 is high. The standard deviation measures the spread of various data points (McBurney & Theresa 149). They show how close the data points are to the mean. For instance, the high standard deviation of distribution indicates that the data points in this distribution are spread out over a wide range of values. For distribution 1 and 2, the standard deviation is low implying that the data points are very close to the mean. The histograms constructed indicate the frequency of the data sets giving a glance of how they are distributed. According to figure 1, it can be noted that distribution 1 assumes a normal distribution unlike the other two. This is clearly illustrated in the enhanced histogram (figure 4). The “bell-shaped” graph for distribution 1 supports this. They have provided a quick comparison about the range (from the minimum and maximum of the sample), the location (from the median), and the spread (from the quartiles) of the three data distributions (Black 23). The box plot is a visual presentation of the 5-number summary which makes it possible to quickly compare the three distributions at once (Moore 45). The median is indicated by the central dark line, while the first and the third quartiles are at the edges of the boxes giving the inter-quartile range. The extreme ends of the lines indicate the maximum and minimum respectively and the difference between them gives the range of the distributions. At a glance (from the box plots above), the median, range, first quartile and the third quartile for distribution 2 are the greatest followed by distribution 3 and lastly by distribution 1. From the normal probability plots, it is evident that the data for distribution 1 was generated from a normal distribution while the data for distribution 2 and 3 did not originate from a normal distribution. Data for distribution 2 originated from a bimodal distribution. The time series graphs also indicate how the data is distributed from time to time in a chronological manner (Babbie 93). Time series graphs for distribution 1 and 2 indicate that the data is distributed evenly but for distribution three the data is negatively skewed (i.e. concentrated to the right). Through data analysis raw data is prepared into information that can be presented in graphs and tables for a quicker and easier interpretation of results. Therefore, it is effective to quickly determine how data is distributed in a given set from graphs or summary tables than it is from the raw data. 4. Conclusions It is easier to understand and interpret analyzed data as compared to raw data. Various data analyses are carried out for various reasons. This report has analyzed data among three distributions according to their central tendencies, variability and distribution over a given range. From the three data sets, distribution 1 and 3 seemed to have a close mean to one another and the range of their data distribution did not vary so much. However, the analysis indicates that the data for distribution 1 originated from a normal distribution while data for distribution 3 did not. In terms of variability, distribution 2 depicts a high range as well as a high standard deviation, which implies that the data points are far from the mean. Based on these results, any future work on this topic should also include more parts of analyzing the measures of central tendencies such as comparing between the mean by utilizing the T-tests, as well as variance rather than a more concentration on only the nature of the distribution just as in this report. Works Cited Babbie, Earl E. The Practice of Social Research. Mason, OH: Cengage Learning, 2010. Print. Black, Ken. Business Statistics: Contemporary Decision Making. New York: John Wiley and Sons, 2009. Print. Brase, Charles, and Pellillo, Brase. Understandable Statistics: Concepts and Methods. Mason, OH : Cengage Learning, 2011. Print. Harrison, S, and Tamaschke U. Applied Statistical Analysis. Melbourne: Prentice-Hall, 2007. Print. McBurney, Donald, and Theresa White. Research Methods. Cengage Learning, 2009. Print. Moore, David. The Basic Practice of Statistics. Boston, MA: Palgrave Macmillan, 2009. Print. Read More

Simple Data Analysis Investigation - Report Example

Extract of sample "Simple Data Analysis Investigation"

CHECK THESE SAMPLES OF Simple Data Analysis Investigation

Class, Gender, Sexuality and Schooling

Gunpowder Residue and Analysis

Adverse event in healthcare (Australia)

The Investigation of an RTF Virus

Short Tandem Repeat (STR)

Analysis of I Survived My Hate Crime Video

Steps in the Research Process

Sample Population