IT Methodology Research Paper Example | Topics and Well Written Essays

?Chapter 2 Background 2 of Methodology The research methodology employed for the purpose of this kind of research varies with the project. While subjective projects mostly need literature based reviews, projects that require programming need more of evaluative research that involves technical review. The present project is also based on evaluative research. It attempts to explore various data mining methods. This chapter focuses on the different approaches to data mining along with other aspects associated with it. Research can be classified into three types – qualitative, quantitative, both qualitative and quantitative (Creswell, 2002). Quantitative research explores and describes facts quantitatively while qualitative research explores facts in a more qualitative manner. For the present research project, both qualitative and quantitative approaches have been employed. The data was evaluated using a quantitative approach and the attributes of the data were evaluated through a qualitative approach. The qualitative approach has also been used to scrutinize the various approaches to data mining. As per the requirements of this project, qualitative research was conducted in accordance with the objectives. This research was carried out to formulate the questions to be asked by the subjects. The subjects were the medical personnel of Abu Dhabi police hospital. The questions were based on diabetes and they were planned so as to enable the gathering of appropriate data for fulfilling the objectives of the present project. Qualitative research was carried out using books, the web, and other sources. The questions asked and the answers obtained for the questions have been listed in appendix 1 at the end of the document. The data set obtained after analyzing the answers given by the medical staff was collected through data mining. The data was then examined. This is the quantitative aspect of the research project. Qualitative and quantitative research methods have their own advantages and disadvantages. The utility of each method depends on the objectives of the research. 2.2. Qualitative and Quantitative Research Advantages and disadvantages of qualitative research The major advantage of qualitative research is that the analysis is credible and it requires thoughtful processing of the data to derive a comprehensive conclusion from it. The disadvantage of this kind of research is that it is based only a limited amount of information and that the conclusions derived from the data may vary depending the individual’s ideas and thought process. Advantages and disadvantages of quantitative research The major advantage of quantitative research is that it is cheaper. The data for the research can be acquired easily and compared with other research. The disadvantage of quantitative research is that some types of data may not be easy to get or the data obtained could be incomplete in some aspects. In this research project, it was difficult to obtain medical data due to concerns related to privacy. The quantitative research was based on the data obtained from the hospitals in the UAE. Sequential Language was used to alter the acquired data. This step was required in order to test the objectives of this project. Despite the difficulty in obtaining medical data, the entire project is based on collected data. Review of literature and creation of data file A literature review was carried out before carrying out further research. All kinds of publications including journals, books, textbooks, and online sources that detailed information on data mining were intensively reviewed. Aspects of data mining and their methods were studied and applied to information on diabetes. A data spreadsheet by the name Diabetes.arff was designed keeping the information on diabetes in view. The quality of the acquired data was examined and data mining algorithms then applied to the file containing the collected data. Data processing The data on diabetes obtained from various sources along with information obtained from the hospitals was reflected upon. The information for this project was collected from authentic sources using the appropriate procedures. Extra information was collected on medical data mining. The techniques employed in this project help in identifying the central issues while conducting the test. Based on the results of the research, the objectives can be identified. The objectives of this research project can be identified as follows: Understanding the techniques of data mining Assessing the quality of the medical data set Interpreting the data and deriving valuable and meaningful patterns and conclusions 2.3. Introduction to the Project While the amount of data has been increasing steadily, the relevance of the data has been declining. The utility of this kind of data is very low because of the irrelevance of the information in spite of the presence of a large amount of data. There are a number of softwares that help in analyzing the data; however the data itself is in need of improvement. The use of data collection tools has enabled the collection of an immense amount of data which would have not been possible manually by humans. The use of data tools for collection of huge amounts of data has caused the creation of “data tombs” (Jiawei et al., n.d.). Manual collection of data by humans would be expensive and laborious. The use of data mining tools has lightened the burden to a large extent. The importance of data mining tools is that they can be used to identify hidden patterns in the data. Huge amounts of data can be analyzed and conclusions can be drawn based on the patterns observed. Data mining tools are of use not only in determining or identifying the patterns but also in predicting the behavior of the data. Therefore, such tools find great applications in business, medical science, and other forms of research that help in making informed decisions and plan out future course of action based on available and predicted data. Data mining, also called pattern analysis, involves the mining or extraction of information from huge databases. The information extracted is meaningful and lends insights about the raw data. Data mining tools enable the extraction of information from unrefined datasets. Because of its utility, data mining is also called as knowledge discovery. Figure 1 illustrates the concept of data mining from unrefined data sets. Figure 1 An example of a programming language that is used for managing data is SQL. It is a query language that helps in information from data. It also enables the manipulation of data in a DBMS (Database Management System). However, it appears that in view of the ever increasing need for more cryptic information, the present queries are inadequate. The figure 2 shows how queries for databases work. It shows how data is retrieved from databases through queries. Figure 2 From the figure, it can be understood that with the help of an SQL query request, data can be retrieved from the database. The SQL request extracts the information from the database and gives it to the user. However, the user cannot access the actual data. This disadvantage is absent in data mining. A theory on data mining, called market basket analysis, is a good example of predicting the behavior. For instance, buying an item in a certain group increases the chances of buying an item in another group. Market basket analysis investigates the correlation between items that are bought. The correlation is usually written as, for instance, IF {honey, no butter} THEN {Bites} This method of analysis is easier for a small number of items but becomes increasingly complex with increasing number of items. Data mining is applicable for a variety of applications. For instance, it has been employed in understanding the buying patterns of consumers from various locations by automotive companies in order to target the potential consumers with the help of brochures and in the analysis of historical information on nuclear power plants in order to estimate the chances of nuclear disaster occurring in the absence of appropriate precautions (Thuraisingham, 1999). 2.4. Data Mining Approaches Data mining, as stated by Berry and Linoff (2000), is a methodology that accomplishes the task of extraction of information. Two approaches exist for data mining – the bottom up approach and the top down approach (Thuraisingham, 1999). Another approach called the hybrid approach is a combination of these two approaches. The top down approach involves the use of specific criteria using a structured language. The criteria are then tested downward. If a point does not correlate with the used criteria, it is corrected and the new conditions are then updated. On the other hand, the bottom-up approach does not employ any criteria and instead uses a more detailed approach and everything is created anew. The initial step involves the assessment of data, followed by data filtering, and then the creation of the criterion. The approach can either be direct or indirect. The outputs can either be unknown (unsupervised/undirected learning) or known (supervised/directed learning). The data mining technique applies various processes in order to derive knowledge based on data. Figure 3 Processes or techniques involved in data mining 1. Data pre-processing (data is keenly evaluated before beginning data mining): Data Cleaning: This is performed to remove any unwanted data, also called unnecessary noise. Data Integration: This is performed to integrate or join data from different sources. Data Selection: This is performed to acquire relevant data from the database. Data Transformation: This is performed by taking the data through conversion sequences for the miner. It is carried out through aggregation operations or summary operations. 2. Data mining and post-processing (data is mined and presented in a usable form): Data Mining: It is performed to extract hidden patterns in the given data. Pattern Evaluation: This is used to evaluate the patterns and extract valuable ones that give useful knowledge. Knowledge Presentation: This is performed in order to present the extracted knowledge in a more comprehensive way. Each of the topics discussed above have been elaborately discussed in the upcoming sections. 2.5. Data Pre-Processing Techniques Before data is used for mining or extracting useful information, it has to be processed in order to facilitate the mining process. Preparation of data, also called data pre-processing, covers about 50-80% of the entire data mining procedure (Fayyad et al., 1996). During this process, the data is cleaned and processed in order to make it useful for data extraction. Cleaning and conversion into the appropriate form is first performed and the data then becomes ready for extraction of useful knowledge or hidden patterns. The various steps involved in data pre-processing are discussed as follows: 2.5.1. Data cleaning procedure In this process, the data is analyzed to look for mistakes and errors. Any values that are missing or inconsistent/irrelevant are replaced. Therefore, data cleaning mostly caters to two issues, namely – missing values and irrelevant data Missing values: Usually, when large amounts of data are collected, it is observed that some missing values are always present. The missing values could either relate to a nominal attribute or a numerical value in the data set. Missing values are adjusted in the following ways: 1. Missing values are filled manually: This procedure is quite useful but is very laborious and time consuming especially if the amount of data is very large and there are a lot of missing values. 2. An attribute mean is applied to fill the missing value: This procedure involves the use of a mean of the available values for the attribute whose values are missing. The mean is used to fill the missing values for a given attribute. 3. Omitting the tuple: This procedure is done by ignoring the tuple which contains the missing values. This is done especially if the class value itself is missing. Sometimes when a tuple has many attributes whose values are missing, this procedure is highly useful. 4. Filling the missing value using the value closest to it: This procedure is performed with the help of regression analysis in order to fill in the missing values through reasoning. The missing value is in fact predicted with the help of a decision tree from a dataset that is similar to the one with the missing values. The value closest to the missing value is chosen and filled in its place. This method is very frequently employed by data miners. 5. All missing values are filled with a single constant value: In this method, the missing value is filled in with a constant value. The global constant value could be anything such as the use of an unknown. The data mining process would consider this constant value as an interesting value. This procedure is disadvantageous as it would result in imperfect data would result if a constant value is used to fill all the missing values throughout the dataset. Erratic or irrelevant data In a dataset, there may be some attributes that are of no importance and are irrelevant to the objectives of the procedure (Dunham, 2003). This is called irrelevant data. This kind of data is also processed in the data cleaning procedure before data mining. Noisy data Noise or noisy data is the data that results from improper calculation. Various techniques can be employed for eliminating such erroneous noisy data prior to data mining. The noisy data could be due to collection error, arbitrary error, or because of variation in the variable calculation. This could be removed with the help of various procedures such as binning and regression. Binning Consider data for price sorted out as follows: 2, 4, 10, 14, 16, 24, 25, 30, 35 Figure 4 Binning is performed on the given data as shown in figure 4. The data is smoothened out by consulting the neighboring values and then the data is arranged in portions containing equal number of values. These portions are called bins. The figure 4 shows how smoothening is performed on the data through the binning technique. Firstly, the values for price are sorted out according to frequency into equal portions. There are three portions or bins with three data values each. To perform smoothening, each value is substituted by the bin’s mean value. For example, for the data 14, 16, and 24, the mean value is 18. Therefore, the value 18 is used to substitute all the values in bin 2. Similar procedure is performed for bin 3. In this case, the boundary values are replaced with the largest and the least values for each bin and the remaining values are replaced with the nearest boundary value. Regression Regression is a function that is used for a dataset to estimate or predict a number. However, the data should be defined with non-linear or linear functions. The function given below is derived by using the gradient technique. Y = f(x, a) The quality of regression depends on the characteristics of the data analyzed. A correlation exists between the values of x and y and the relation becomes relevant based on the strength of the relation between x and y. 2.5.2. Data integration and selection procedure After cleaning the data, data integration procedure is employed. This procedure involves the integration of various datasets into a single dataset. This is followed by data selection in which the important data is selected for mining. In this process, the data is divided into various classes to facilitate data mining. 2.6. Data Mining, Evaluation of Pattern, and Data Reduction After the data is preprocessed, cleaned, integrated and selected, data mining procedure is performed. In this process, knowledge based on hidden patterns in the data is extracted. This is followed by pattern evaluation in which the key patterns are identified in order to gather relevant domains from the extracted data. Sometimes, the data finally extracted is very large and may need to be broken down into smaller portions or data sets. In such a case, data reduction techniques are employed. Various techniques like filtering can be employed to reduce the size of the extracted data without destroying its quality (Daedalus 1998). Read More

IT Methodology - Research Paper Example

Extract of sample "IT Methodology"

CHECK THESE SAMPLES OF IT Methodology

Product Reliability

Who Mandiant Is

Prevention of Falls in Elderly Rehabilitation Patients

The Supreme Court Decisions in the Trilogy Cases of 1960

NSB224 RESEARCH APPROACHES IN NURSING

Mental Illness Increases HIV and AIDS Tests

Enterprise Project Management

Throwing Mechanics and Elbow Valgus in Professional Baseball Pitchers by Sherry, L.W., Tricia, A.M