# Data Mining - Lab Report Example

Summary
Gender, number of previous data science courses by a student, students’ self assessed data mining efficiency, future career goals, geo-location, and preference for a one by…

## Extract of sample "Data Mining"

Data mining June 18, Data mining The survey aimed at developing information on backgrounds for informing teaching practices. Gender, number of previous data science courses by a student, students’ self assessed data mining efficiency, future career goals, geo-location, and preference for a one by one virtual meeting were the study’s variables. Data issues to the collected data, cleaning and analysis results are discussed. SPSS software was used for analysis.
Data issues and cleaning
Missing data was the most prevalent issue in the data set (Tan, Steinbach, & Kumar, 2006). All data for one participant (ID+ R_wZTAo2AjoAUTWvf) were missing. In addition, data on the number of science related course that a student had taken and data on years of professional experience that a student had prior to the course were missing for some of the participants. In addition, data on expected salary for first job had unrealistically low values and required cleaning. Means were used to clean data on previous number of science course and professional experience and expected salary while mode was used to clean ordinal data.
The following table summarizes descriptive statistics of the numeric scale variables.
Table 1: Descriptive statistics
Descriptive Statistics
N
Minimum
Maximum
Mean
Std. Deviation
Skewness
Statistic
Statistic
Statistic
Statistic
Statistic
Statistic
Std. Error
previous data science related courses
23
.00
4.00
2.8226
1.05466
-1.098
.481
previous years of professional experience in dara areas
23
.00
21.00
3.6478
4.17722
3.425
.481
23
29795.78
145000.00
46605.9435
32496.99730
2.341
.481
Valid N (listwise)
23
The three data sets are skewed (p> 0.05) and this means that the media is the best descriptive statistics. The following table shows the statistics.
Table 2: Median for the numeric variables
Statistics
previous data science related courses
previous years of professional experience in data areas
N
Valid
23
23
23
Missing
0
0
0
Mean
2.8226
3.6478
46605.9435
Median
3.0000
3.6500
29795.7800
Mode
2.82a
3.65
29795.78
a. Multiple modes exist. The smallest value is shown
A majority of the students, therefore, had undertaken about three science related courses and had about 3.65 years of professional experience in data areas. The students expected first salary of about \$ 29795.78.
A majority of the students (60.9 percent) were fair in data mining efficiency while only 8.7 were good. Only 21.7 percent had much confidence in becoming data analysts after graduation while 56.5 percent were not sure of their positions. Most of the students lived away from campus with 34.8 percent being within a driving distance while 52.2 percent lived far away, though within the United States. Most of the students preferred a one-by-one virtual meeting. The following histograms illustrate the distributions.
Graph 1: Data mining efficiency
Graph 2: Interest in becoming data analyst after graduation
Graph 3: Distance from campus
Graph 4: Preference for a one-by-one virtual meeting
The following table shows significant correlations, based on results in Appendix A.
Table 3: Significant correlations
Previous data science related courses
Previous years of experience in data
0.448
Previous years of experience in data
Expected first salary
0.494
Efficiency
Interest in data analysis
0.489

Correlation between expected salary and level of efficiency identify the role of expected salary on motivating students into the subject.
Summary
Majority of the students have sufficient background knowledge in data mining, having done many related course. They however lack experience in data mining and report average efficiency. Their level of motivation into data analysis profession is low, their locations are far from the campus, and they prefer one-by-one virtual meetings. A one on one approach to learning that focuses on technology for online study is therefore recommended.
References
Tan, P., Steinbach, M., & Kumar, V. (2006). Introduction to data mining. Boston, MA: Pearson Addison Wesley.
Appendix A: Correlation coefficients
Correlations
previous data science related courses
previous years of professional experience in dara areas
Gender 1
Efficiency1
Interest1
distance1
virtualmeeting1
previous data science related courses
Pearson Correlation
1
.448*
.122
-.183
-.044
.226
.287
.068
Sig. (2-tailed)
.032
.578
.402
.842
.299
.184
.759
N
23
23
23
23
23
23
23
23
previous years of professional experience in dara areas
Pearson Correlation
.448*
1
.494*
-.273
.009
.086
.212
-.047
Sig. (2-tailed)
.032
.017
.207
.967
.695
.333
.832
N
23
23
23
23
23
23
23
23
Pearson Correlation
.122
.494*
1
-.099
.010
-.115
-.111
-.157
Sig. (2-tailed)
.578
.017
.652
.963
.600
.615
.474
N
23
23
23
23
23
23
23
23
Gender 1
Pearson Correlation
-.183
-.273
-.099
1
-.086
-.270
-.111
-.273
Sig. (2-tailed)
.402
.207
.652
.696
.212
.614
.207
N
23
23
23
23
23
23
23
23
Efficiency1
Pearson Correlation
-.044
.009
.010
-.086
1
.489*
-.285
-.064
Sig. (2-tailed)
.842
.967
.963
.696
.018
.187
.772
N
23
23
23
23
23
23
23
23
Interest1
Pearson Correlation
.226
.086
-.115
-.270
.489*
1
-.094
-.300
Sig. (2-tailed)
.299
.695
.600
.212
.018
.668
.164
N
23
23
23
23
23
23
23
23
distance1
Pearson Correlation
.287
.212
-.111
-.111
-.285
-.094
1
.154
Sig. (2-tailed)
.184
.333
.615
.614
.187
.668
.483
N
23
23
23
23
23
23
23
23
virtualmeeting1
Pearson Correlation
.068
-.047
-.157
-.273
-.064
-.300
.154
1
Sig. (2-tailed)
.759
.832
.474
.207
.772
.164
.483
N
23
23
23
23
23
23
23
23
*. Correlation is significant at the 0.05 level (2-tailed). Read More
