Data Mining and Knowledge Discovery in Database Research Paper Example | Topics and Well Written Essays

KNOWLEDGE DISCOVERY IN DATABASE al Affiliation) Introduction Knowledge Discovery in Database is an automatic organizedprocess of identifying valid and useful patters from large and complex set of data. Data mining is the main factor in the process involving the inferring of algorithms that explore the data, develop the model and discover the previous patters that are unknown (Dai, Liu & Smirnov, 2012). The method is mainly used in understanding the phenomenon from the data, prediction and analysis. Data mining in the current situation has been important due to abundance of data that makes uses the knowledge of data discovery. In the current situation, no process of handling of data has been considered superior than the other. The aim of the research is to come up with the right process involved in data mining and to organize important methods that are developed in the field into unified and coherent catalog; presenting the performance evaluation approaches and techniques and also the cases and software tools that uses the method (Dai, Liu & Smirnov, 2012). The research also focuses on identification of development in challenges for the next generation of science involving data. Current trends KDD has evolved, and continues to evolve, from the intersection of research fields such as machine learning, AI, knowledge acquisition for expert systems, data visualization, pattern recognition, databases, statistics, and high-performance computing. The merging goal is extracting high-level knowledge from low-level data in the context of large data sets. The data mining element of KDD currently depend on heavily on known methods from machine education, pattern acknowledgment, and statistics to find designs from data in the data mining step of the KDD process (DATABASE Editorial Board 2010, 2010). A major question that occurs currently in the use of KDD is how KDD different from pattern recognition or machine learning and other fields that are related to KDD. The answer is that these fields provide some of the data-mining methods that are used in the data-mining step of the KDD process. KDD field is considered to be still in the early stage of development in the sense that further basics are being developed. There is expansion in the art that requires similar expansion of understanding and automation of the nine steps and the interrelation that occur in the dimensions in the field. For interrelation to occur there is need for better characterization of the KDD problem spectrum and definition as these are the main factors that are considered in its development. The terms in KDD are not well-defined in terms of what methods they contain, the type of problem are best solved by the methods and the results that are expected. Data mining in the current situation has been important due to abundance of data that makes uses the knowledge of data discovery (DATABASE Editorial Board 2010, 2010). In the current situation, no process of handling of data has been considered superior than the other. There is proof that has already been seen to results to achievement of KDD but there is still a gap that there are still no estimate results with respect to potential in the field. The basic analysis in this aspect should be studied and trends made for future research and in implementation. The aspect is that which is to include full taxonomy of the nine steps of KDD. There is taxonomy of DM methods but there is no taxonomy that has been developed for the nine steps. A taxonomy that is to be formed is that which will contain methods that are appropriate for each step as well as the whole process. Meta-algorithms are to be achieved so that the algorithms examine the characteristics of the data in order to come out with the best methods and parameters. Benefit analysis is in the trend that aims in understanding the effect of the potential KDD results of the enterprise. Expanding the database for data mining inference to also include data obtained from images, pictures, audio and video is a current trend in the field. The other developments that are in the field include the ability to seamlessly and effectively employment data mining methods on database that are located in varies sites, expanding the base of knowledge for KDD process and expanding Data Mining reasoning Barriers in the process In many application domains, there is the error of generalization of even the best methods are far above the training set. There is the question of whether the question can be achieved and if it can be approved and if so how is it achieved. The other problem is the question of which inducer to use for a specific problem. For more specificity, the performance measure needs to be defined appropriately for each problem. There are some commonly accepted measures that are not considered to be enough. The dilemma on the methods to choose becomes greater if other factors such as comprehensibility are taken into consideration. In a situation where there is a specific domain, neural networks are able to outperform the accuracy in the decision tree. From comprehensibility aspect decision tree are considered are considered to be superior. Induction is considered one of the central problems in many disciplines such as machine learning, pattern recognition, and statistics. However the feature that distinguishes Data Mining from traditional methods is its scalability to very large sets of varied types of input data. Scalability means working in an environment of high number of records, high dimensionality, and a high number of classes or heterogeneousness. Very large amount of data has become problems as used in the dreams of many analysts. Obtaining the desirable volume of information is a factor associated to the application. Information intensive organizations are expected to accumulate large amounts of raw data in every two years. Solution of the problems A taxonomy that is to be formed is that which will contain methods that are appropriate for each step as well as the whole process. Meta-algorithms are to be achieved so that the algorithms examine the characteristics of the data in order to come out with the best methods and parameters. Benefit analysis is in the trend that aims in understanding the effect of the potential KDD results of the enterprise. Expanding the database for data mining inference to also include data obtained from images, pictures, audio and video is a current trend in the field. The other developments that are in the field include the ability to seamlessly and effectively employment data mining methods on database that are located in varies sites, expanding the base of knowledge for KDD process and expanding Data Mining reasoning. This has been considered in the current stages of developments that are made to KDD and data mining. Knowledge Discovery in Database process Knowledge Discovery in Database process mainly consists of nine steps which is interactive and iterative. The process has been considered an artistic process such that one is not able to present a formula or able to make a complete taxonomy for the right for each step and application type. The process begins with determining the goals and ends with implementation of the knowledge that is discovered. The first step involves developing of an understanding of the application domain. The step involves preparing the scene to enable understanding what is to be done with many decisions. The people in Knowledge Discovery in Database need to understand and define the objectives of end-user and the environment in which the knowledge discovery will take place. Understanding the goals is useful in the first three steps. The next step of the process is selection and creation of a data set on which discovery is to be performed. The process after the goals have been defined involves determination how the data will be used. The process involves the data that is available, obtaining the necessary additional data and integrating all the data that is available for the knowledge of one discovery. This process is vital since data mining learns and discovers from data that is available (Doreswamy & Hemanth, 2011). This forms the evidence base for constructing the models that are to be used. The tradeoff in the process represents an aspect where the iterative and interactive aspect of Knowledge Discovery in Database takes place. The next step involved in Knowledge Discovery in Database is selection of a target data set or a subgroup of data samples in which study and discovery are to be made. In the stage, there is enhancing made on reliability of data. The processes that are involved in the stage are those that includes clearing where missing values are handled and removing that which are not necessary (Doreswamy & Hemanth, 2011). In this aspect, there can be the involvement of statistical method of determining algorithm. The extent to which attention is paid in this process depends on the many factors that are involved. The next step involves transformation of data by removal of variables that are not wanted. The process is very important for the entire process of KDD since it is always project specific. The process then involves analysis on the useful data that can be used to represent the data depending on the objectives or the task to be performed. There are two factors in data mining; prediction ad description. Prediction is referred to supervised data while descriptive data mining is referred to as unsupervised data process. The sixth process involves choosing the data mining algorithm. The stage involves selecting of specific method to be used in searching pattern. For example, in making an allowance for precision against understandability, the former is better with neural networks, while the latter is better with decision trees. For each strategy of meta-learning there are several prospects of how it can be achieved. Meta-learning aims at explaining what leads to a Data Mining algorithm to be effective or not in a specific problem. The approach aids the consideration on the condition under which data mining algorithm is most suitable. The next process involves employing the data mining algorithm. This is the final process of implementation of data mining algorithm the step involves employing the algorithm several times until a result that is satisfying is achieved. This is done through timing the algorithm control parameters. The parameters can be those such as the minimum number of instances in a single leaf in a tree decision. The next stage involves evaluation and interpretation of the undermined processes with consideration of goals that are defined in the first step. The process involves consideration of preprocessing the steps with respect to the effects that they have to data mining algorithms. The main focus of the process is on comprehensive and usefulness of the implemented process. The last stage involved in data mining is using the discovered knowledge. In the last stage, one is ready to incorporate the knowledge that is obtained into another system for further action. The knowledge becomes active in the sense that changes are made to the system and measure the effects. There is however many challenges that are in the process such as the loss of laboratory condition in which the analysis was carried out. KDD models and Human Interaction Data mining method constitute mainly three components. The first component is the model. There are two relevant factors that are considered in this aspect there is consideration of function of the model such as clustering and classification. The other aspect is that of representational form of the model. This is through linear functions of multiple variables and the probability in Gaussian density function (DATABASE Editorial Board 2010, 2010). The application model contains parameters that are determined from the data. The other component considered is the preference criterion (Doreswamy & Hemanth, 2011). This is a basic for preference of one model or a set of parameters to the other depending on the data that is dealt with at the time. The criterion is usually some form of goodness-of-fit function of the model to the data, perhaps tempered by a smoothing term to avoid over fitting, or generating a model with too many degrees of freedom to be constrained by the given data. The other consideration is the search algorithm (Leung, 2010). The specification of an algorithm for finding particular models and parameters, given data, a model (or family of models), and a preference criterion Data Mining Methods and Algorithms The algorithms that are used in data mining include k-means algorithms. The algorithm is a simple iterative method that is used to divide a dataset into a user specified number of clusters; k. the set operates on some dimensions. The algorithm integrates between two steps in convergence including data assignment as the first step and relocation of means as the second step. The limitation of k-means algorithm is that of fitting data by a mixture of k Gaussians with identical, isotropic metrics. The algorithm can be paired with other algorithm to be able to describe clusters that are non-convex. Support vector machines are also applied in KDD. The aim of the application is to find the best classification function so that they distinguish between members of the two classes in the training data. Support vector machines can be extended to perform several calculations. The most popular data mining process is to find the frequent item sets itself transaction dataset and derive association rules. EM algorithm is a flexible and a mathematic-based approach to the modeling and clustering of data that is observed on the random phenomena. The model is that which can be used to cluster continuous data and to estimate the underlying density functions (Leung, 2010). AdaBoost is an algorithm which employs multiple learners to solve a problem. Data mining application areas There are many areas where data mining process are utilized including science, business, websites and governments. In science data mining is applied in astronomy, discovery of drugs and bioinfumations. In business, the approach is applied in advertising, customer relationship management, manufacturing activities, investments, sports and entertainment, telecom, e-commerce, healthcare and targeting marketing. In websites, KDD has been applied in bots and search engines. KDD in government is applied in law enforcements, profiling of tax in the country and tax cheaters and anti-terror. In application of KDD in marketing, the main application is in application is database marketing systems which involve analyzing database of the customers to identify the different groups of customers and predict their behavior. It has been predicted that are in the process of planning to use or are in the process of using database. In investment several companies use data mining investment in their system but the companies do not describe the processes. In fraud detection, the application has been used in monitoring credit card fraud. The frauds that exist in using credit cards can be detected through the use of data mining, a process of KDD. Research findings The main findings of the research were on different algorithms and the process that is used in KDD. The process was brought out as a process that goes in steps with one process depending on the other. . The first step involves developing of an understanding of the application domain. The step involves preparing the scene to enable understanding what is to be done with many decisions. The next step of the process is selection and creation of a data set on which discovery is to be performed. The process after the goals have been defined involves determination how the data will be used. The next step involved in Knowledge Discovery in Database is selection of a target data set or a subgroup of data samples in which study and discovery are to be made. In the stage, there is enhancing made on reliability of data. The next step involves transformation of data by removal of variables that are not wanted. The process is very important for the entire process of KDD since it is always project specific. The process then involves analysis on the useful data that can be used to represent the data depending on the objectives or the task to be performed. The sixth process involves choosing the data mining algorithm. The stage involves selecting of specific method to be used in searching pattern. The next process involves employing the data mining algorithm. This is the final process of implementation of data mining algorithm the step involves employing the algorithm several times until a result that is satisfying is achieved. This is done through timing the algorithm control parameters. In the findings, there was also consideration of different factors that can be used in improving the application so that the Conclusion and future research proposals Knowledge Discovery in Database is an automatic organized process of identifying valid and useful patters from large and complex set of data. Data mining is the main factor in the process involving the inferring of algorithms that explore the data, develop the model and discover the previous patters that are unknown. There is proof that has already been seen to results and achievement of KDD but there is still a gap that there are still no estimate results with respect to potential in the field (Leung, 2010). The basic analysis in this aspect should be studied and trends made for future research and in implementation. The aspect is that which is to include full taxonomy of the nine steps of KDD. There is taxonomy of DM methods but there is no taxonomy that has been developed for the nine steps. A taxonomy that is to be formed is that which will contain methods that are appropriate for each step as well as the whole process. Data mining is a broad area that assimilates techniques from several grounds with machine knowledge, data, design recognition, artificial intellect, and database schemes, for the investigation of large capacities of data. There have been a large number of data mining algorithms entrenched in these fields to accomplish different data investigation tasks. The aim of the research was to come up with the right process involved in data mining and to organize important methods that are developed in the field into unified and coherent catalog; presenting the performance evaluation approaches and techniques and also the cases and software tools that uses the method. This has been achieved in the research bringing the main process in KDD. References Analysis on medication rule in prescriptions for hemoptysis of Menghe physician Ma Peizhi based on knowledge-discovery in database. (2014). China Journal of Chinese Materia Medica. doi: 10.4268/cjcmm20140414 Correction: Human Transporter Database: Comprehensive Knowledge and Discovery Tools in the Human Transporter Genes. (2014). Plos ONE, 9(5), e98396. doi:10.1371/journal.pone.0098396 Dai, H., Liu, J., & Smirnov, E. (2012). Reliable knowledge discovery. New York: Springer. DATABASE Editorial Board 2009. (2009). Database, 2009(0), bap023-bap023. doi:10.1093/database/bap023 DATABASE Editorial Board 2010. (2010). Database, 2010(0), baq032-baq032. doi:10.1093/database/baq032 Doreswamy, & Hemanth, K. (2011). Hybrid Data Mining Technique for Knowledge Discovery from Engineering Materials Data Sets. IJDMS, 3(1), 166-177. doi:10.5121/ijdms.2011.3111 FuÌˆrnkranz, J., Scheffer, T., & Spiliopoulou, M. (2006). Knowledge discovery in databases. Berlin: Springer. Kok, J. (2007). Knowledge discovery in databases. Berlin: Springer. Leung, Y. (2010). Knowledge discovery in spatial data. Heidelberg: Springer. Rob, P., & Coronel, C. (2002). Database systems. Boston, MA: Course Technology. Read More

Data Mining and Knowledge Discovery in Database - Research Paper Example

Extract of sample "Data Mining and Knowledge Discovery in Database"

CHECK THESE SAMPLES OF Data Mining and Knowledge Discovery in Database

A Level of Useful Inference upon a Given Idea

Knowledge Generation

High Level ETL and Data Mining Requirements

Waikato Environment for Knowledge Analysis

Data Mining: the Personalization of the Organizations Business Processes