StudentShare
Contact Us
Sign In / Sign Up for FREE
Search
Go to advanced search...
Free

Data Classification Using Weka Software - Lab Report Example

Cite this document
Summary
The paper "Data Classification Using Weka Software" outlines that in a general understanding, data mining or knowledge discovery is the process of providing meaning to a set of data through proper analysis. The data analyzed and given meaning may come from a different range of sources…
Download full paper File format: .doc, available for editing
GRAB THE BEST PAPER93% of users find it useful

Extract of sample "Data Classification Using Weka Software"

A REPORT ON DATA CLASSIFICATION USING WEKA SOFTWARE Student Name Course Instructor Institution of Affiliation City Submission Date Introduction In a general understanding, data mining or knowledge discovery is the process of providing meaning to a set of data through proper analysis. The data analyzed and given meaning may come from different range of sources. The data may be obtained as free texts from websites or other social sites, also, the data can be a structured data existing within a given structured data repository. There exist various tools for data mining, and these are the tools that are used to derive sense from a given set of data. Data mining is widely used technique in addressing the existing complexities in data, more so the unstructured data. Weka is one of the most easy to use, but complete data mining tools. It is a machine learning tool which allows the use of various algorithms to perform data classification and other machine learning tasks. Weka algorithms (the classifiers) can be used from other programs such as java codes, python codes among others. Nevertheless, Weka can also be used directly within the software which is tailored free under the licensing of GNU General Public License (Hall et al., 2009). Weka as software can perform the following machine learning tasks: Data preprocessing: These refer to the set of processes which Weka allows users to perform on the data before the actual machine learning processes. Regression calculation: Regression calculation involves the use of statistical regression equation to calculate the relations among the data presented to Weka. This is a very important process in Weka’s machine learning as it is depended upon by various other processes like the data classification and clustering. Data classification: Data classification is simply the process of organizing (classifying) data into several categories for easy, efficient and most effective use. A properly designed classification algorithm makes data available for business use, in other words, it makes the data easy to retrieve and analyze. Data clustering: This is the process of putting data into several different with less attention on the inner details. Compared to classification, data clustering is a simple way of putting data into simple cluster classes. Data visualization: Data visualization, as the name suggests, is a way of having a pictorial perception of a given data set. In many cases, data visualization is done through visual reports. He visual reports are the various statistical tools which are used to obtain trends and make observations on the data. Data association: data association refers to the relationships among data sets. These associations are important in determining the classification classes of a given data set. Weka makes use of various association rules to classify data (Hall et al., 2009). Data Description Data description refers to the information about the data under study. It is closely related the term metadata which is data about data. In data mining and classification, data description plays a very major role in coming up with the clusters or classes in which the data belongs. Weka is keen on the data types, or the file types which it has the ability of performing any processing on. The files presented to Weka must therefore be in either of the supported formats. Weka supports file formats like .arff, .csv, C4.5 among others. Weka users must then have the knowledge of data and file conversion in order to achieve the best from Weka. For this specific assignment, the data files provided are all .csv files which are supported. There was no need therefore to convert the data into a different format, but rather use the data just as they are. The Data sets were therefore imported into weka tool, one set at time. The data files provided for this assignment included: sub-0.csv, sub-1.csv, train.csv, test.csv. The data is a bank client data which contains a number of attributes. The dataset came with a description file, illustrating what every attribute represented. Input The input to the classifier was the set of data with the associated attributes. The bank client data should undergo training and classifications to obtain the most appropriate results. The data underwent a number of transformations such as splitting the data to obtain a higher level of accuracy in the classification. Also, conversion of some columns to allow for proper classification was also conducted. Output After running the data over a number of classifiers, the required output should be a variable to y which is answers the question whether a client has subscribed a term deposit or no. The output is a binary value, taking either 'yes' or 'no'. Results Below are some of the screen shots showing the Weka results obtained for the classifier algorithms that were chosen for this experiment. The experiment started by transforming the data and splitting the datasets into 2. Data Classification Neural Networks In Weka, Neural Network is under the MultilayerPerceptron. The most commonly used Neural Network architecture is feed-forward. It is always characterized by input layer, hidden layer and finally the output layer. The output signal which always corresponds to the input vector which has the attributes to be classified shows the class to which the object belongs. For this case, given that the output was a binary variable (0,1), the neural network interpretation was done as a probability (result shown on the screen-shots on the results section). The values corresponding to the unit a, is corresponding to the probability that the specific input vector belonged to the same class. The above presentations (screen-shots) give specific details on the logics and equations applied by neural networks. The above output gives the node types, the inputs and the weights. Given that there was no any alteration of the network topology, we have all the hidden layer nodes being sigmoid, and output layer nodes being linear units. From the above results, we can see that the answer to our question according to MultilayerPerceptron is “No”. The binary result 0. Support Vector Machines In weka, Support Vector Machines implements sequential minimal optimization algorithm, and hence the name SMO. To use support vector machine therefore, we went to the classify, functions and chose SMO. One advantage of using the classier for this specific assignment is its ability to convert nominal attributes to binary. In addition, it automatically normalizes all attributes as a default behavior. This based the output coefficients on the normalized data and not the original data which is very important for interpretation of the classifier. The above screen-shots show the results of the classification done by Support Vector Machine algorithm on weka tool. The algorithm can be used for both classification and regression, however, for this experiment, it was only used for classification. As seen from the above results, the algorithm automatically tries to normalize the data, giving a more precise and accurate results. The resulting coefficients which are obtained from normalized data and not the actual data makes it clear to make sense out of the result. The interpretation of the above result shows a “No” answer to the task in question. Discussion Weka is a machine learning tools with various algorithms for data classification. It is an easy to use tool with simple button clicks to achieve the goals. However, it also has the simple CLI which can also be used to achieve most of the functionalities. In this assignment, the graphical user interface was used. Weka also have additional three interfaces, these includes: Explorer interface is responsible for the provision of the graphical front end to Weka’s components and routines. Experimenter: this part allows the user to build classification experiments on Weka Knowledge Flow: this section is an alternative option to the Explorer as graphical user front end user interface for Weka’s main algorithms. The set of data for the assignment is imported into Weka for the classification to be done. The next stage after data importation was the filtering. The imported set of data was taken through filters to clean up. Some of the filters contained in Weka are meant for normalization, re-sampling, discretization and transforming, combining attributes, attribute selection etc. For this task, data transformation was done. This involved NominalToBinary Transformation This transformation was basically necessary for MultilayerPerceptron algorithm. This algorithm has no automatic way of normalizing the data. Preprocessing of the data is therefore a very important step when doing data classification using weka. The Classifiers During the classification, a total of two different classifiers were applied for the sake of this experiment. They are Neural Networks and Support Vector Machine classifiers. Each classifier gave a different result depending on the set parameters. The difference mainly lied on the accuracy and the biasness (threshold). SVM showed high level of over-fitting compared to NN. SVM classier showed the highest level of accuracy and speed. Reflection on the Assignment After successful classification of the dataset which was provided using the two algorithms, Neural Network and Support Vector Machines, the differences of the algorithms were clear. It was observed that SVM was able to perform the same number of epochs far much faster than neural networks. Also, the output interpretation for the result was easier with SVM than the Neural Network. This made Support Vector Machine better for this specific task compared to Neural Network. Conclusion The assignment was successful and the classification was achieved using the Weka tool. The various algorithms which were used gave different results. This is because the classifiers vary in several perspectives. Weka allows easy application of the various classification algorithms to achieve the variations and compare the results. Through the use of Weka, the assignment was possible and a lot of processing was done to the data. The recorded results were exact outcomes from Weka. Reference Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P. and Witten, I.H., 2009. The WEKA data mining software: an update. ACM SIGKDD explorations newsletter, 11(1), pp.10-18. Read More
Cite this document
  • APA
  • MLA
  • CHICAGO
(Data Classification Using Weka Software Lab Report, n.d.)
Data Classification Using Weka Software Lab Report. https://studentshare.org/information-technology/2055587-ass3
(Data Classification Using Weka Software Lab Report)
Data Classification Using Weka Software Lab Report. https://studentshare.org/information-technology/2055587-ass3.
“Data Classification Using Weka Software Lab Report”. https://studentshare.org/information-technology/2055587-ass3.
  • Cited: 0 times

CHECK THESE SAMPLES OF Data Classification Using Weka Software

A Level of Useful Inference upon a Given Idea

As a function of using weka to draw a level of inference upon the medical data, it was able to determine and populate a lot of missing values by treating these values as separate attributes and encoding them appropriately.... Supervisor Motivation When I had first begun to consider the long list of projects that were available, I had at first thought that as a software engineering specialist.... As a function of performing these experiments, weka was utilized....
4 Pages (1000 words) Essay

Design Patterns - Behavioral, Creational, Structural

Design Patterns ITEC N452: Advanced Object Oriented Analysis & Design University name 3rd June 2013 Contents Introduction 3 Memento Pattern 3 Benefits and drawbacks 4 Singleton pattern 4 Benefits and drawbacks 4 Facade Pattern 5 Benefits and drawbacks 5 References 7 Introduction In software engineering, a design pattern represents a re-usable solution to a recurrent problem for a certain context.... Memento Pattern This is a software design pattern that has the capability of bringing back an object to its initial state....
3 Pages (750 words) Research Paper

Software Engineering

The design methodologies are mostly used in the technological field like web design, software or information system design.... classification of methodologies is also done on the geographical location such as SSADM and MERISE.... Other methodologies like IE emphasize on analysis of data....
11 Pages (2750 words) Essay

Waikato Environment for Knowledge Analysis

weka software offers businesses a collection of learning tools and schemes that may be used for data mining (Witten, 2011, p.... The software contains many tools including that… It can also be used for the development of new machine learning schemes.... This software was found by the University of Waikato on the islands of New Zealand.... It was named after a It is a free software issues under GNU General Public License....
4 Pages (1000 words) Essay

Managing Information Technology

The two concepts are equally intertwined, while data mining is related to the dissected and detailed investigation and assessment of the overall facts and data at hand, the data warehouse in contrast serves as a repository of information and overall monitoring level tool and software support system.... Number of techniques and strategies can be adopted with regard to the data mining process and utilization of the software.... Two of the most commonly used techniques include clustering and classification techniques....
5 Pages (1250 words) Coursework

Change Management of People and Technology in an ERP Implementation

This paper "Change Management of People and Technology in an ERP Implementation" deals with PowerIT that is a well-established company that has identified the need of implementation of ERP (Enterprise Resource Planning) software for the future progress of the company.... nbsp; In place of the IT department, the management thought it more appropriate to buy the needed software.... According to the management, the IT department did not keep the necessary expertise required for the development of software and application domain (Edwards and Humphries 2005)....
7 Pages (1750 words) Coursework

Machine Learning Algorithms and Tools

hellip; Machine learning is a branch of computer science that involves the usage of computer software and algorithms that allow the computer to model data.... The paper "Machine Learning Algorithms and Tools" explores the functionality of the weka tool that consists of visualization tools and algorithms used for data analysis and predictive modeling, the experimenter application that allows comparison of the predictive performance of weka's machine learning algorithms on a collection of datasets, etc....
8 Pages (2000 words) Essay

Classification of Chances of Defaulting to Pay

The training set was used by the RandomForest node to construct a random forest classifier while the test set was used by weka Predictor Node to evaluate the random forest classifier.... he random forest classification modelThe random forest classification model was built using KNIME as shown below Figure 2: A construction of the KNIME nodes for a random forest classification model.... … The paper “classification of Chances of Defaulting to Pay” is an informative example of a lab report on logic & programming....
6 Pages (1500 words) Lab Report
sponsored ads
We use cookies to create the best experience for you. Keep on browsing if you are OK with that, or find out how to manage cookies.
Contact Us