StudentShare
Contact Us
Sign In / Sign Up for FREE
Search
Go to advanced search...
Free

Machine Learning Algorithms and Tools - Essay Example

Cite this document
Summary
The paper "Machine Learning Algorithms and Tools" explores the functionality of the WEKA tool that consists of visualization tools and algorithms used for data analysis and predictive modeling, the experimenter application that allows comparison of the predictive performance of Weka’s machine learning algorithms on a collection of datasets, etc…
Download full paper File format: .doc, available for editing
GRAB THE BEST PAPER93.4% of users find it useful
Machine Learning Algorithms and Tools
Read Text Preview

Extract of sample "Machine Learning Algorithms and Tools"

Machine Learning Machine learning is a branch of computer science that involves the usage of computer software and algorithms that allow the computerto model data. The technique is uses data to improve its accuracy at a given task. Data is required because unlike humans who learn from their past experiences, machines cannot learn in a similar fashion. They need a concrete learning mechanism and that can only be made by giving the machine data that models a specific situation, the machine then learns on the data set and can later be used to predict outcomes of an unknown data set based on the acquired learning. Therefore, machine learning focuses on automatically learning complex patterns and then making intelligent decisions based on data provided.  There are different machine learning algorithms used in the real world but our focus will only be on two of these algorithms namely supervised learning and unsupervised learning. Supervised Learning Supervised learning is a technique that defines the effect of a set of observations, called inputs has on another set of observations, called outputs (Valpola). The inputs are assumed to be at the beginning and outputs at the end of the causal chain. The model works by training on a collection of records called the training set. Each of the record in the training set contains an attribute while one attribute is designated as the categorical variable, i.e. the one that needs to be predicted. This prediction is done by finding a model for the categorical attribute as a function of the values of the other attributes. Once, this learning takes place, a previously unseen data set is given to the machine to assign classes to the categorical variable. The goal is to assign classes as accurately as possible. Usually the training set is divided into two segments, training segment and the test segment, with the training set used to build the model while the test set is used to validate its results. Example of supervised learning is when it is applied to reduce the direct marketing costs incurred by a company. The goal is to reduce the cost of mailing by targeting a specific set of customers who are more likely to buy a new cell phone product. The approach towards solving this problem will be to first use data for a similar product. Then, collect various demographics i.e. age, lifestyle, income, company interaction, technological knowledge etc. Once, all these attributes are collected we will use this information as input attributes to learn a classifier model. Then the model will be used to predict on the dataset of the current customers, only the ones’ who will have a high probability of buying will be targeted (Linoff, 1997). Unsupervised Learning In unsupervised learning we aim to show how systems can learn to represent particular input patterns in a way that reflects the statistical structure of the overall collection of input patterns. Therefore, in unsupervised learning we need to determine how data is organized and then we use this learning for decision making, predicting future inputs, efficiently communicating the inputs to another machine, etc. Hence, unsupervised learning can be thought of as finding patterns in the data. As opposed to the supervised learning technique, unsupervised learning does not employ explicit target outputs or evaluations based on a set of inputs. This technique rather focuses on building representation of the input with getting feedback from the data or the environment. An example of unsupervised learning being put to use is in the problem when a designer needs to produce T-shirts for all sizes, he is faced with a dilemma to offer what sizes. The designer cannot offer one size fits all and neither can he tailor make T-shirts for every individual. Hence, by using clustering the designer groups people of similar sizes into distinct categories such as “small”, “medium”, “large” and “extra-large”. In the real world there are numerous examples where the use of artificial intelligence is required. One such problem occurs when people apply for credit cards. Since, credit cards and frauds are prevalent in most parts of the world; banks take extreme precautions in handing over these cards to customers. Each day thousands of new applicants apply for credit card and for banks every customer is important. Therefore, they need to classify accurately the probability of a customer who will default and should not be allowed a card as compared to the ones’ who will likely be good business prospects. This is an extremely tricky business as it is a decision that will bring in business to the bank or one that will cause loses for the bank. Artificial intelligence is used to solve this problem by taking into account the data of an individual who is applying for a credit card. The data has various attributes which range from age, sex, marital status, education to previous credit history and the number of bank accounts. Hence, a model is created using the data of individuals that is available on hand to predict the outcome for a new applicant, i.e. whether that person will be a good prospect for the bank or if he is more likely to commit a fraud. The machine learning techniques are then used to predict the outcome of for an individual customer. People who are recommended by the machine are usually issues a credit card while the ones’ for which the software predicts a high degree of fraud are neglected. Using such techniques is much more beneficial for banks as compared to the previously employed mechanism of risk analysis department which worked manually to assess the risk of individual customers. The technique that is used to predict this outcome is called classification. Classification is a technique that uses different algorithms to predict the outcome of a categorical variable based on the machines learning on a set of attributes related to the same data set. There are numerous algorithms are used for classification, the most famous ones’ are classification trees, naïve bayes and the neural networks. Running a sample document in the WEKA tool The sample document is related with the credit card problem discussed above. The goal of this sample is to run the machine on the test data so as to improve its learning about the characteristics of people who are more likely to default. The data set consists of 31 variables associated with the details of an applicant who applies for a credit card. The variables are id (used to identify a client), sex, age, marital status, education, phone number, income, profession code etc. The data will be used as the training set for the study and the prediction will be made on a set of data similar to this one but with different records. The training set will comprise of 50,000 entries while the test will be done on 10,000 records. Hence, for each of the 10,000 records we need to predict whether the outcome will be favorable or not. The result had to be compiled after performing an analysis on the training data. We had to determine which attributes were important and which were not. So, we had to note down probabilities for different combinations of attributes. Finally, we selected the set on which the minimum tolerance was shown. After deleting the unwanted attributes the training set was altered the columns showing the unwanted attributes were deleted. The software was again fed an input of a file that only had the concerned attributes and their respective records. After the learning was over, we fed the software with another file, in which it had to predict the outcome of the customers. This file had 10,000 entries and for each entry the software determined the probability of success and failure and based on these probabilities the software concluded whether the customer should be given a card or not. Main objectives of WEKA The WEKA tool consists of a number of visualization tools and algorithms that are used for data analysis and predictive modeling. The tool uses graphical user interfaces for easy modeling and representation of data. The recent release of the software i.e. 3.6.0 is based on the Java Platform and has a lot of algorithms and techniques built in for data mining purposes. There are a lot of factors for the success of this version the most important ones’ include, its free, portable and operating system independent because of the Java platform, is easy to use and involves user during the process by showing graphs and bar charts of different attributes and above all it houses a huge collection of data processing and modeling techniques. The set of tasks that can be done using the tool include data mining, clustering, classification, regression and associative memory. All of these techniques can be implemented provided that the data is fed into the tool using its standard input formats. The standard input format is the .arff format and data has to be fed into the tool using this format. Once, the data is fed into the tool it can then be used to perform an analysis using an specified technique for example, using the naïve bayes algorithm one can predict the outcome of a categorical variable. Hence, the major functionality of the software tool is to model data and to perform data mining activities on a set of input data. Functionality of the WEKA tool The WEKA tool has four major applications associated with it. When a user opens the interface there are four major applications that can be worked upon. They are the explorer, the experimenter, knowledge flow and the simple CLI. The details of the functionality of the individual applications are: Explorer The Explorer application has seven tabs each showing a different functionality. These tabs provide the user with the access to the different components of the software tool. The first tab is the Preprocess tab and it has facilities for opening files from the computer, from the internet (using a URL) and from a a database. Once file is loaded, it shows the different attributes in the file, their respective statistics and a bar chart that indicates the tabulation of the different values for a given attribute. A user can also apply filtering algorithms in the same application. The second tab is the classifier tab, using this tab a user can perform classification and regression and then view the results in the space provided in the GUI. The panel also enables the user to estimate the accuracy of the technique used, to visualize erroneous predictions, create ROC curves, etc. After an analysis is performed the statistics are also viewed using this panel. A very important part of the result i.e. the confusion matrix is also shown in this panel. The accuracy can be determined very effectively by looking at the confusion matrix of the given result. The third tab is the cluster tab which is used for clustering, i.e. unsupervised learning. This panel gives access to the clustering techniques such as the k-means algorithm. While the fourth tab is the associate tab which is used for identifying the interrelationships between attributes in the data. The next tab is the Select attributes tab which is used for identifying the most predictive attributes in a dataset and the last tab is the Visualize tab which is used for visualizing the individual attributes in graphs and scatter plots. Experimenter The experimenter application allows comparison of the predictive performance of Weka’s machine learning algorithms on a collection of datasets. The user is required to give a destination for the file that will be created after the comparison, the dataset and algorithms are also needed to be determined by the user and finally the user has to set the experiment type and the iteration control. Once, this has been done a comparison can be made by using the run tab while an analysis can be done using the analyze tab. Knowledge Flow The knowledge flow is responsible for generating the same functionality as the explorer but the approach is different. The explorer used the browse and click strategy to function while the knowledge flow uses the modeling technique to do the same things. In the knowledge flow the user has to model the activities using the tools available on the GUI. Each tool can be dragged on to the work area and can then be linked to another. The final functionality is derived as a result of all the connections in between different tools. Simple CLI The simple CLI is the command line interface for the Weka tool. Bibliography Linoff, B. &. (1997). Data Mining Techniques. Valpola, H. (n.d.). Supervised Learning. Retrieved December 23, 2009, from http://www.cis.hut.fi/harri/thesis/valpola_thesis/node34.html Read More
Cite this document
  • APA
  • MLA
  • CHICAGO
(Machine Learning Algorithms and Tools Essay Example | Topics and Well Written Essays - 2000 words, n.d.)
Machine Learning Algorithms and Tools Essay Example | Topics and Well Written Essays - 2000 words. Retrieved from https://studentshare.org/information-technology/1731633-machine-learning
(Machine Learning Algorithms and Tools Essay Example | Topics and Well Written Essays - 2000 Words)
Machine Learning Algorithms and Tools Essay Example | Topics and Well Written Essays - 2000 Words. https://studentshare.org/information-technology/1731633-machine-learning.
“Machine Learning Algorithms and Tools Essay Example | Topics and Well Written Essays - 2000 Words”. https://studentshare.org/information-technology/1731633-machine-learning.
  • Cited: 1 times

CHECK THESE SAMPLES OF Machine Learning Algorithms and Tools

Application of Eigen Analysis

In addition, Eigen is meticulously accessed via its own analysis set algorithms are cautiously chosen for consistency purposes by evidently documenting reliability substitutions.... Date Eigen Analysis Eigen analysis is an essential tool the in linear programming mathematics hence its universal application in science, numerical computation and bindings, robotics engineering, Google, 3D globe component, computer graphics and mobile apps....
4 Pages (1000 words) Essay

Natural Language Processing

They describe 'speech acts' as 'utterances' that a speaker pronounces as tools to interact in real-life situations where knowledge of the language and the appropriate contextual use of the speech acts within a given culture are required.... A strong AI system is conscious of itself and operates beyond a strict adherence to preprogrammed rules and algorithms.... 'Understanding' is an intentional phenomenon, along with perceiving, acting and learning....
4 Pages (1000 words) Essay

Data Mining In Tracking Customer Behavior Patterns

A number of modeling tools help in data mining.... All these contribute to extracting needed data from the databases using data mining tools.... There have been a number of statistical applications and algorithms that were waiting for larger computing power to arrive.... Data mining makes use of these algorithms to enable data mining possibilities....
6 Pages (1500 words) Essay

Analysis of a Bankruptcy for a Firm

 This essay discusses bankruptcy for a firm its inability to meet its contractual commitments The two most important factors in a bankruptcy are the extent to which a firm's debt exceeds its assets, and its inability to pay its debt as it comes due.... nbsp;…  Even firms with no debt can go bankrupt if they are unable to pay salaries to their employees, or they do not have enough working capital for their operations....
4 Pages (1000 words) Essay

Data Encryption

Cumulative learning Encryption is one of the main tools for ensuring information security.... n case of business point of view this system offer simple pass phrase as well as machine revival that is a Local self-revival, one-time-use token in addition to additional recovery alternatives....
3 Pages (750 words) Essay

Thinking machine response

These ‘organs' all carry out specialized subject matter, and each has its own learning mechanism.... He provides the explanation to the computational theory of mind by drawing comparisons of the human Ali Almuhamidh Chris Pyle Thinking machine response ‘Thinking machines' is an essay by Steven Pinker.... He provides the explanation to the computational theory of mind by drawing comparisons of the human mind to a thinking machine.... inker (536) uses the Turing machine to advance the computational theory of mind....
2 Pages (500 words) Essay

Waikato Environment for Knowledge Analysis

The software contains many tools including that… It can also be used for the development of new machine learning schemes.... Businesses also use the weka programme as it is suitable for the development of new machine learning schemes.... Weka software offers businesses a collection of learning tools and schemes that may be used for data mining (Witten, 2011, p.... Moreover, the program also has many tools for the data clustering, attributes evaluator, and association rules....
4 Pages (1000 words) Essay

Industrial Control System

Even though many of these tuning techniques work in practice, not much is known about the heftiness or solidity of these algorithms past what has been observed in the past empirical study.... Data gathering hierarchySome of the ways are founded on the characterizing the vibrant response of the machine to be managed with the initial order representation with time holdup (Astrom & Hagglund, 2006)....
6 Pages (1500 words) Assignment
sponsored ads
We use cookies to create the best experience for you. Keep on browsing if you are OK with that, or find out how to manage cookies.
Contact Us