StudentShare
Contact Us
Sign In / Sign Up for FREE
Search
Go to advanced search...
Free

Simple Data Analysis Investigation - Report Example

Cite this document
Summary
This report "Simple Data Analysis Investigation" presents data distributions, the means of the three distributions that vary from one another despite the fact that the values for the means of distribution 1 and 3 seem to be close to one another…
Download full paper File format: .doc, available for editing
GRAB THE BEST PAPER98.9% of users find it useful

Extract of sample "Simple Data Analysis Investigation"

Your name Instructor’s name Class name Date assignment due Simple Data Analysis Report 1. Introduction This report uses descriptive statistics to analyze and compare three data sets generated from different probability distribution each; Distribution 1, Distribution 2, and Distribution 3. For the purposes of this report it is assumed that the data analysis will provide a quick and easy way of understanding the information presented in the graphs and tables than it is from the raw data. However, comparison of the three data is limited by a greater range disparity in distribution which makes it difficult to present all the three distributions in one graph This discussion presents a data analysis of the three distributions using descriptive measures of variations, central tendency, standard deviation, kurtosis and skewness in revealing the relationship between the three data sets in engineering discipline. Variability and the distribution of the data sets are also analyzed. 2. Project Work- Data Analysis To enable a better data comparison between the three random sets of data taken from different probability distributions in the engineering sector, various analyses were conducted as illustrated below. 2.1. Descriptive statistics To describe the characteristics of each of the distributions, and the relationship among variables in these distributions, the data is presented in a descriptive statistics table as shown below. Table 1. Joint descriptive statistics for the three distributions Distribution  1 Distribution 2  Distribution 3  Mean 0.69136601 Mean 47.41486305 Mean 0.755006874 Median 0.706262121 Median 46.35399952 Median 0.901929762 Mode #N/A Mode #N/A Mode #N/A Standard Deviation 0.096373963 Standard Deviation 28.45418991 Standard Deviation 0.280616036 Kurtosis -1.160620685 Kurtosis -1.07227086 Kurtosis 0.106719043 Skewness -0.257885989 Skewness 0.101666661 Skewness -1.217514961 Range 0.338817888 Range 99.51351836 Range 0.962505455 Minimum 0.501437529 Minimum 0.431081214 Minimum 0.024923257 Maximum 0.840255417 Maximum 99.94459957 Maximum 0.987428712 Count 150 Count 150 Count 150 2.2. Histograms The general spread of each of the distributions can be illustrated by arranging the data sets into frequency distributions and then drawing histograms of each distribution. According to Harrison & Tamaschke (91), histograms are graphical representations that allow individuals to comprehend the shape of a given frequency as a glance and it therefore provides quick and easy ways to determine the distribution of each data set. The graphs below present the histogram of each individual data distribution. Figure 1: Histogram for distribution 1. Figure 2: Histogram for distribution 2. Figure 3: Histogram for distribution 3. An enhanced summary histogram below gives a comparison in the distribution of the three data sets. Since there is a big disparity in the numerical range between Distribution 2 and the rest, the values of this distribution are scaled down by 100 (i.e. a bin range of 100 is represented by 1). Figure 4: Enhanced Histogram for the three data distributions. 2.3. Empirical rule From the three data sets being analyzed, it is only Distribution 1 that seems to follow a normal distribution as shown in figure 1 above. Therefore, it is only Distribution 1 that will conform to what the empirical rule implies. Figure 5 below shows that Distribution 1 follows a bell shaped distribution. Figure 5. Emperical rule for distribution 1 Since the distribution assumes a bell shape, then the data is from a normal distribution and thus conforms to the empirical rule that states that 68% of the data points fall within ± 1 std. dev.; 95% within ± 2 std. dev.; and 99.7% within ± 3 std. dev (Black 61). 2.4. 5-number summary The 5- number summary presented in table 2 gives a concise summary of the distribution of the data. Table 2 below gives a summary of the five crucial sample percentiles for the given data sets. Table 2. A table showing the 5-number summary of the three distributions 2.5. Box plots In comparing the three groups of distribution, box plots for all the distributions are drawn on the same scale as illustrated in figure 6 below. Figure 6. Box plots for the distributions As seen from the box plots above, it is difficult to make comparisons between distribution 1 and 3 since their ranges are small unlike distribution 2 with a big range disparity. Therefore, a specific comparison between the two distributions is done by plotting side-by-side box plots for distribution 1 and 2. This is illustrated in figure 7 below. Figure 7. side-by-side box plots for distribution 1 and distribution 2 2.6. Normal probability plot To assess whether or not the distributions are approximately normally distributed, normal probability plots for the data are plotted. Figure 8, 9, and 10 below shows the normal probability plot for Distribution 1, 2, and 3 respectively. Figure 8: normal probability plot for distribution 1. Figure 9: normal probability plot for distribution 2. Figure 10: normal probability plot for distribution 3. 2.7. Time series graph To show the data measurements in a chronological order, time-series graphs are constructed for each of the distributions. A column of consecutive number presenting the time index are created and then put on a horizontal scale while the data sets are plotted on the vertical scale. Figures 11, 12, and 13 below show the time-series graphs for distribution 1, 2 and 3 respectively. Figure 11. Time series for distribution 1 Figure 12. Time series for distribution 2 Figure 13. Time series for distribution 3 3. Results/Discussion From the descriptive statistics carried on the data distributions, the means of the three distributions vary from one another despite the fact that the values for the means of distribution 1 and 3 seem to be close to one another. For distribution 2, the mean is even much greater than the two. According to (Brase & Pellillo 83), variation in data sets is determined through the use of various measures of variation. These measures determine the range of the distributions in relation to the measures of central tendencies. The range in distribution 1 is smaller that the range in distribution 3, which is also far much smaller than in distribution 2. This means that the difference from the minimum measure and the highest measure in distribution 2 is high. The standard deviation measures the spread of various data points (McBurney & Theresa 149). They show how close the data points are to the mean. For instance, the high standard deviation of distribution indicates that the data points in this distribution are spread out over a wide range of values. For distribution 1 and 2, the standard deviation is low implying that the data points are very close to the mean. The histograms constructed indicate the frequency of the data sets giving a glance of how they are distributed. According to figure 1, it can be noted that distribution 1 assumes a normal distribution unlike the other two. This is clearly illustrated in the enhanced histogram (figure 4). The “bell-shaped” graph for distribution 1 supports this. They have provided a quick comparison about the range (from the minimum and maximum of the sample), the location (from the median), and the spread (from the quartiles) of the three data distributions (Black 23). The box plot is a visual presentation of the 5-number summary which makes it possible to quickly compare the three distributions at once (Moore 45). The median is indicated by the central dark line, while the first and the third quartiles are at the edges of the boxes giving the inter-quartile range. The extreme ends of the lines indicate the maximum and minimum respectively and the difference between them gives the range of the distributions. At a glance (from the box plots above), the median, range, first quartile and the third quartile for distribution 2 are the greatest followed by distribution 3 and lastly by distribution 1. From the normal probability plots, it is evident that the data for distribution 1 was generated from a normal distribution while the data for distribution 2 and 3 did not originate from a normal distribution. Data for distribution 2 originated from a bimodal distribution. The time series graphs also indicate how the data is distributed from time to time in a chronological manner (Babbie 93). Time series graphs for distribution 1 and 2 indicate that the data is distributed evenly but for distribution three the data is negatively skewed (i.e. concentrated to the right). Through data analysis raw data is prepared into information that can be presented in graphs and tables for a quicker and easier interpretation of results. Therefore, it is effective to quickly determine how data is distributed in a given set from graphs or summary tables than it is from the raw data. 4. Conclusions It is easier to understand and interpret analyzed data as compared to raw data. Various data analyses are carried out for various reasons. This report has analyzed data among three distributions according to their central tendencies, variability and distribution over a given range. From the three data sets, distribution 1 and 3 seemed to have a close mean to one another and the range of their data distribution did not vary so much. However, the analysis indicates that the data for distribution 1 originated from a normal distribution while data for distribution 3 did not. In terms of variability, distribution 2 depicts a high range as well as a high standard deviation, which implies that the data points are far from the mean. Based on these results, any future work on this topic should also include more parts of analyzing the measures of central tendencies such as comparing between the mean by utilizing the T-tests, as well as variance rather than a more concentration on only the nature of the distribution just as in this report. Works Cited Babbie, Earl E. The Practice of Social Research. Mason, OH: Cengage Learning, 2010. Print. Black, Ken. Business Statistics: Contemporary Decision Making. New York: John Wiley and Sons, 2009. Print. Brase, Charles, and Pellillo, Brase. Understandable Statistics: Concepts and Methods. Mason, OH : Cengage Learning, 2011. Print. Harrison, S, and Tamaschke U. Applied Statistical Analysis. Melbourne: Prentice-Hall, 2007. Print. McBurney, Donald, and Theresa White. Research Methods. Cengage Learning, 2009. Print. Moore, David. The Basic Practice of Statistics. Boston, MA: Palgrave Macmillan, 2009. Print. Read More
Cite this document
  • APA
  • MLA
  • CHICAGO
(Simple Data Analysis Investigation Report Example | Topics and Well Written Essays - 1500 words, n.d.)
Simple Data Analysis Investigation Report Example | Topics and Well Written Essays - 1500 words. https://studentshare.org/statistics/2047107-statistics-project-1-simple-data-analysis-comparison-report-guidelines
(Simple Data Analysis Investigation Report Example | Topics and Well Written Essays - 1500 Words)
Simple Data Analysis Investigation Report Example | Topics and Well Written Essays - 1500 Words. https://studentshare.org/statistics/2047107-statistics-project-1-simple-data-analysis-comparison-report-guidelines.
“Simple Data Analysis Investigation Report Example | Topics and Well Written Essays - 1500 Words”. https://studentshare.org/statistics/2047107-statistics-project-1-simple-data-analysis-comparison-report-guidelines.
  • Cited: 0 times

CHECK THESE SAMPLES OF Simple Data Analysis Investigation

Class, Gender, Sexuality and Schooling

hellip; This article is titled “Class, Gender, (Hetero) Sexuality and Schooling: Paradoxes within Working Class Girls' Engagement With Education and Post-16 Aspirations” The research findings and discussion are presented by the authors with a view of illustrating how a qualitative research methodology is applied to obtain credible and informative data for the audience.... Archer, Halsall and Hollingworth (2007, p 170) employ theoretical perspectives and resources to interpret the data of the qualitative study....
9 Pages (2250 words) Essay

Gunpowder Residue and Analysis

nbsp; This test was formally known as the 'paraffin test' but is hardly used any longer due to the idea that many nitrates are not guaranteed to provide enough specificity as well as the knowledge that this type of testing takes rather large deposits of nitrites in order to develop a correct color analysis.... nbsp; However, because this test was utilized quite often in years past it is still referred to when discussions about analysis of GPS comes up among forensic scientists....
11 Pages (2750 words) Essay

Adverse event in healthcare (Australia)

These information systems help in data collection in relation to... Patient safety is very important in the medical field.... dverse events can result in harmful effect on a patient.... This is an area of concern in Australia that improving safety and quality in health care has become a priority in many hospitals....
8 Pages (2000 words) Essay

The Investigation of an RTF Virus

RTF virus has claimed the lives of many people around the world.... The Asian countries are the ones that are affected more by this virus.... This is a sexually transmitted disease and many people have lost their lives.... People are not aware of the virus and its vulnerability.... hellip; The lack of awareness creates panic among the people since most of the people affected by this virus lose their lives....
8 Pages (2000 words) Essay

Short Tandem Repeat (STR)

Being introduced as forensic investigation method in criminal cases in the early 1990s, Short Tandem Repeats (STRs) represent a class of microsatellite sequences seen throughout the human genome and in other eukaryotic and prokaryotic organisms (Fan & Chu, 2007).... STRs are… Like variable number tandem repeats (VNTRs), STRs represent tandem arrays of short sequences (2-6 bp) that differ in copy number between people....
5 Pages (1250 words) Essay

Analysis of I Survived My Hate Crime Video

The Federal law is more limited than other state statutes because it justifies a hate crime for FBI investigation by use of force or threat of force.... From the analysis of the ‘I Survived My Hate Crime video (FBI), the victim felt threatened, intimidated, and helpless.... The FBI Uniform Crime Reporting Program (UCR) compiles hate crime data submitted voluntarily by about 17000 law enforcement agencies across the United States.... From statistics, intimidation was the leading type of hate crime with 47% followed by simple assaults and aggravated assaults with 31% and 26....
2 Pages (500 words) Essay

Steps in the Research Process

hellip; The investigation is always carried out in the context of data utilization.... Similarly, an adequately defined topic or problem allows an appropriate direction when conducting the investigation, and aids in utilizing the available resources for effective research.... In other words, data examination takes place within the context in which it occurs, (Yin 34).... his is a statement anchored on certain presumptions concerning the existence of a linkage amidst some variables that can be investigated via empirical data....
5 Pages (1250 words) Assignment

Sample Population

he research study will have data sources that will provide actual information shedding light on how employees can use the social media as a platform to raise grievances.... The data sources will play a crucial role in the research, as they will explain the theoretical concept of social media and how it has been misused in the organization.... In so doing, the information collected or data collected will be compared with what the information provided by the sample population and establish whether there is a relationship between the data....
8 Pages (2000 words) Assignment
sponsored ads
We use cookies to create the best experience for you. Keep on browsing if you are OK with that, or find out how to manage cookies.
Contact Us