Important Data Mining Techniques Essay Example | Topics and Well Written Essays

? Data Warehousing and Mining By Table of Contents INTRODUCTION Data mining refers to the method of examining data from diverse viewpoints and transforming it into valuable information (information that can be used to raise income, reduce expenditures, or both). Additionally, data mining is also known as data or knowledge discovery. In addition, data mining uses a comparatively high computing power working on a massive collection of data to find out relationships and regularities between data points. Moreover, data mining uses a lot of techniques from machine learning, statistics and pattern recognition to explore large databases automatically (Frand, 1998) and (Anissimov, 2011). This paper will discuss the concept of data mining in detail. This paper will discuss the main aspects, techniques and algorithms of data mining. This paper will also assess the market applications of data mining. DATA MINING Data mining is a technique which is used to evaluate business or corporate data from a target source and after that turn that data into valuable and useful information. This corporate information is normally employed to facilitate a business to raise profits, reduce cut expenditure in specific business areas. Moreover, the main purpose of data mining applications is to recognize and take-out similar business configuration enclosed in a given set of corporate data (Bradford, 2011). IMPORTANT DATA MINING TECHNIQUES This section outlines some of the prime and important data mining techniques. Some of the main techniques are presented below: Neural Networks/Pattern Recognition Neural Networks are utilized in a blackbox style. In this technique, an individual produces a set of data for testing purpose, which allows the neural network to find out patterns based on the identified results, then for these data permits the neural network imprecise on massive amounts of data provided. For instance, a credit card business can have more than 60,000 data records, in which more than 100 records are recognized as the fraud records. In addition, the analysis data set updates the neural network to make certain that it recognizes the difference between the fraud records and the legal records to form right kind of patterns (Chicago Business Intelligence Group, 2011), (Han & Kamber, 2006) and (Laudon & Laudon, 1999). Memory Based Reasoning This technique can offer same results which can be achieved from neural network however the working of this technique is different from neural networks. In addition, the memory based reasoning searches for "closely related" type of data, rather than considering similar working patterns (Chicago Business Intelligence Group, 2011) and (Han & Kamber, 2006). Cluster Detection This is a standard technique of data mining which is used to assess the relationship between market and business transaction data because it discovers associations from data patterns. Mainly, this method discovers associations in clients or product or anywhere we desire to discover interaction in data (Chicago Business Intelligence Group, 2011) and (Han & Kamber, 2006). Link Analysis This is another method for relating similar business records. However, this method is not utilized extensively; on the other hand, a number of methods and software applications have been built on the basis of this technique. Since its name states, this technique attempts to discover associations, either in dealings, various products, consumers, etc. as well as reveals those associations (Chicago Business Intelligence Group, 2011) and (Han & Kamber, 2006). Visualization This method of data mining facilitates the users to recognize their data. In this scenario, visualization is used to create the association from text established to visual/graphical arrangement. In addition, various other techniques such as rule, decision tree, pattern visualization and cluster facilitate users to observe data associations rather than reading the associations. Moreover, a lot of powerful data mining systems have taken effective actions for enhancing their illustrative content over the last few decades (Chicago Business Intelligence Group, 2011) and (Han & Kamber, 2006). Decision Tree/Rule Induction Decision tree is one of the most popular data mining techniques. It is applied on the real business and corporate data mining algorithms. In this scenario, decision trees facilitate with categorization as well as give information that is well-expressed, helping the users to recognize their data. In addition, a decision tree process produces the rules that are engaged in a process (Chicago Business Intelligence Group, 2011) and (Han & Kamber, 2006). Genetic Algorithms Genetic Algorithms work in the same manner as bacteria spread in a petri-plate. In addition, we establish a business or corporate data set afterward offer the Genetic Algorithms capability to perform dissimilar things for whether a result or direction is encouraging. In this scenario, the Genetic Algorithms enter a trend that expectantly improves the efficiency of last consequence. Moreover, the Genetic Algorithms are utilized frequently for process optimization, like that order of activities, development, and process re-engineering and grouping (Chicago Business Intelligence Group, 2011) and (Han & Kamber, 2006). ALGORITHMS OF A-PRIORI AND K-MEANS A-Priori Huge volumes of information and data have been gathered normally everyday to perform operations and processes in business such as administration, banking, government, the deliverance of health and public services, protection, environmental safety and in political affairs. However, this type of data is mainly utilized for accounting purposes and for administration of the client support. Normally, management and business data groups are extremely large size and continually increasing as well as include a great amount of complicated characteristics. Seeing that these business data sets act as assets of the directed corporations and issues, and are therefore possibility of a number of uses to their holders, they frequently have comparatively small solidity of business information. In this scenario, corporations need straightforward, influential and computationally well-organized method to take out hidden information from similar given data sets. Thus, this improvement of processes is the foundation of business data mining (Hegland, 2005). Now I will assess APRIORI algorithm that is a level-wise algorithm. It is used to examine the corporate dealings and transaction database a number of times. Following the initial scan the common 1-itemsets are discovered, as well as in common following the kth data scan the common k-item-sets those are taken-out. However, this technique is not useful in finding the support of each potential item-set utilized. In addition, in an effort to reduce data-set domain that is to be investigated, prior to each pass it produces candidate item-sets. In this scenario, an item-set turns out to be a candidate if each data separation of it is common. In addition, visibly each common item-set requires being candidate as well; therefore just the support of candidates is computed. Common k-item-sets produce the candidate k+1-item-sets following the kth scan. Moreover, when all the candidate k+1-item-sets have been produced, a new examination of the database transactions is started and the correct support of the candidates is assessed. In this scenario, the candidates with small support are discarded. The algorithm stops when no extra candidates could be produced (Bodon, 2005) and (Hegland, 2005). The major cause following candidate production is based on the subsequent sample detail so that every data sub-set of a common item-set is recurrent. This is instant, for the reason that if a business or corporate transaction t supports an item-set X, after that t upholds each subset. Moreover, with this reality ultimately, we can conclude, that if a data item-set has a subset that is unusual, afterward it would not be recurrent. Consequently in the A-PRIORI algorithm just those data item-sets will be candidates which every data and transaction sub-set is common. In addition, the common k-item sets are obtainable as we try to produce candidate k+1-item-sets. In this scenario, the algorithm looks for candidate k+1-item-sets among the data sets that are combination of two common k-item sets. However, before making the combination we require confirming that all of its subsets are common; otherwise it cannot be presumed as a candidate. Toward this point, it is obviously sufficient to verify if all the k-subsets of X are common (Bodon, 2005) and (Hegland, 2005). It is obviously sufficient to ensure if all the k-sub-sets of X are common. In order to resolve jobs competently A-PRIORI algorithm uses a technique acknowledged as hash-tree, though in this accomplishment a ‘trie’ (that is prefix of tree) is used. In this scenario, tries show a lot of benefits over hash-trees practices (Bodon, 2005) and (Hegland, 2005): 1. It is quicker 2. It requires no parameters (major problem of a hash-tree is that its application is extremely susceptible to the data set parameters) 3. The process of candidate production is extremely simple. K-Means The K-means algorithm was developed by MacQueen in 1967 and it is one of the straightforward unconfirmed erudition data mining algorithms which resolve the famous clustering issues. Additionally, the process tracks an effortless and easy method to categorize a known dataset in the course of a particular number of clusters (that can be taken as k-clusters) predetermined a priori. The basic idea of this algorithm is to describe k-centroids, one for every participating data cluster. However, these middle centroids should be located in an intelligent manner for the reason that of diverse locations typically provide dissimilar outcomes. Consequently, the improved alternative is to put them considerably at a suitable distance from each other. The subsequent step is to acquire every point fit for a known data-set and connect it to the adjacent predefined centroid. While there is no input point imminent, the initial step is finished and an untimely grouping is completed. At this moment, we require computing k new centroids as bary-centers of the known data clusters developed from the earlier step. Following to that we have these k novel centroids, a latest binding has to be performed among the similar data set points as well as the adjacent latest centroid. In this way a loop or circle has to be produced. Because of this loop we can observe that the k-centroids alter their place gradually until no extra transformations are performed. Alternatively centroids do not shift any longer (Matteucci, 2010), (Weisstein, 2011) and (Inmon & Hackathorn, 1994). K-means algorithm is used for clustering or dividing “N” data points into “K” displaced subsets Sj holding Nj data points to reduce the sum-of-squares principle (Weisstein, 2011): In the above given image xn is a vector demonstrating nth data point and is the statistical centroid of the business or corporate data points present in Sj. Generally, the algorithm does not attain an inclusive lowest of J above the assignments or tasks. Actually, as the algorithm employs separate assignment before a group of constant factors, the "minimum" it reaches is not able to still be correctly known as a local minimum. On the other hand, irrespective of these restrictions, the K-means algorithm is used practically due to its simplicity of performance. In addition, the K-means algorithm is composed of a straightforward re-assessment process. This process involves originally, the data points are allocated randomly to the K data sets for some business or corporate data. For the initial phase, the data-set centroid is calculated for every data set. In the next step, each point is allocated to the data set cluster whose middle centroid is nearby to that particular point. After that these two phases are exchanged in anticipation of an ending standard is convened, for example if there is no extra transformation in the task of the data points (Matteucci, 2010), (Weisstein, 2011) and (Inmon & Hackathorn, 1994). According to (Leeser, 1999), there are numerous alternatives of the k-means algorithm for clustering data, however the majority modification entails a repetitive method that could work over a predetermined number of data set clusters, while trying to convince the below given main characteristics: Each data set class has a middle point that is the mean point of the entire data samples in that particular group. Every data set sample is in the cluster group whose middle point is near to the data sample. In view of the fact that K-means is one of the straightforward unconfirmed learning data mining algorithms that divides characteristic vectors into data set k-clusters thus the inside group sum of squares is reduced. In this scenario, this process uses a simple method to categorize a given data groups as well as looks very similar to below give algorithm steps: (Spehr & Winkelbach, 2011) Put erratically primary data group centroids into the 2d space. Allocate every object to the group that has the adjoining centroid. Re-assign the locations of the data-set centroids. If the places of the centroids did not alter go to the next step, or else move to Step 2. End. Figure 1step k means clustering algorithm, Image source: http://people.revoledu.com/kardi/tutorial/kMean/Algorithm.htm The changes to the standard algorithm really boost up the clustering procedure. As both the reference points and the data points for the modernization are selected through chaotic data set sampling technique, additional reference points will be discovered in the close areas of the business dataset as well as the reference points will be restructured through data points in the vital regions. Although implemented to a big data-set, the algorithm usually convenes a solution after simply a small portion (10-15%) of all the points has been recognized. In this scenario, this fast union differentiates the permanent K means from fewer competent algorithms. Moreover, the clustering through the constant k-means algorithm is 10-times quicker as compared to the clustering application through Lloyd’s algorithm (Faber, 1994) and (Zhang et al., 2008). ANALYTIC SQL SQL (Structured Query Language) is a unique functional database programming language used to access, classify and control business and corporate data. Additionally, the SQL is perceived as a nonprocedural programming language, which outlines and explains the essential elements (for example tables) as well as preferred results devoid of stating accurately how those results should be calculated. In this scenario, each SQL system development works on the top of a database engine, whose work is to comprehend SQL language statements and decide how the different data structures existing inside the database should be analyzed to exactly and proficiently construct the necessary results. Moreover, the SQL programming language encompasses two different data sets of SQL commands: DDL (Data-Definition-Language) is the division of SQL utilized to describe and change different data structures; on the other hand DML (Data Manipulation Language) is the division of SQL utilized resourcefully to access and process data enclosed inside the data structures earlier described using Data-Definition-Language. Furthermore, Data-Definition-Language contains several commands intended for managing similar jobs such as making indexes, tables, constraints as well as views (eTutorials.org, 2011). Analytic SQL is application of the modern era and more technology based Data-Warehouse Paradigm (however at this time available just for PostgreSQL server), with complete OLAP (online analytical processing) and effective functionality support (Statistics Paradigms, Mathematical Paradigms etc) on the place of SQL server. In this way the prime characteristic of ASQL is designing extremely huge level Analytic and business intelligence solutions, overlooking the recognized restrictions of present day database systems. In addition, Analytical SQL is one of the initial incorporated business intelligence systems available in the marketplace. Moreover, the Analytical SQL's major applications comprise systems those are established into the group of business intelligence, like that monetary and corporate implementations: (ASQL Group., 2007) and (Oleszkiewicz, 2008) In banking sector for credit scoring, BASEL II, frauds, prediction, money laundering In economic department for financial planning, controlling, MiFID, prediction In healthcare sector In insurance (Solvency II) General Business Purpose Data Analysis In case of Analytic SQL: - Oracle has initiated a number of exciting additions to ANSI SQL to facilitate business knowledge workers to rapidly determine rollups and aggregations. These latest updates comprise “analytic SQL Server support OLAP processes on the business database server: (Oleszkiewicz, 2008) Drill Down Pivoting Slicing Dicing Roll Up Unroll vector to table/tree Drill Across, Drill Through Bulk vector operation Rollup table/tree into vector Extremely significant reality is that Roll-Up/ Drill-Down processes are formed on the related Postgre-SQL SQL language syntax. In addition they do not need some extra SQL clause similar to ROLLUP, CUBE (ASQL does not require establishing these SQL statements in the way of ORACLE does, they are still unnecessary). In this scenario, these straightforward SQL processes allow us to create simple aggregations directly within the SQL without using SQL*Plus break with calculate statements (Oleszkiewicz, 2008) and (Remote DBA, 2011). Improvements and expansions in corporate intelligence involve ROLAP (relational OLAP) with complete hold for MOLAP (multidimensional data processing). Additionally, the incorporation of these two technologies means MOLAP and ROLAP both are openly inverted into capability of posing as well as processing multidimensional data through SQL standard working and operational interface. On the other hand, this is hardly ever possible in other consequences. In this scenario, the system available in products section, acknowledged as SART, is an instance of the results where OLAP model has been designed on inconsistent areas, in addition more particularly on Chart of Accounts (ASQL Group., 2007). Addition of Analytic-SQL Server also allows us to utilize superior analysis formed on statistical as well as mathematical models. In addition, like business intelligence characteristics, the expansions of computational intelligence are integrated within the database management system and are accessible through standard-SQL business and corporate interface. However, this characteristic is not available in presently commercialized business intelligence and decision support systems (ASQL Group., 2007). Throughout the design and development of Analytical SQL a lot of concentration has been given to the system performance that produced extremely high competence of data processing and management. Additionally, among many major data warehouses (significance data about the time-period of 14 months) multidimensional evaluations were carried out inside interval of time that did not go beyond 10 sec, wherever more than 98 percent evaluations were performed in fewer than 5-sec. The above cited technological properties of Analytic-SQL formulate accomplishment of data warehouses with superior data analysis is a low level job, and allow us to attain all the user requirements easily. In addition, Analytic-SQL Homogeneity provided by this analytical system has as well significant impact on minimizing license charges and expenditures, for the reason that it does not ask the customers to pay for 3rd-party licenses thus minimizes cost associated with various systems and tools (for example Statistical programs and DBMS) (ASQL Group., 2007). The nature of Analytic-SQL queries done through DSS (Decision Support Systems) is different from those done through OLTP systems. In this scenario, the queries regarding DSS are utilized by analysts, managers, marketing executives, etc., to spot corporate and market developments, recognize outliers, discover business prospects and forecast future company performance. Seeing that all the queries based on SQL could be written in English language, however, they have traditionally been hard to formulate through SQL for the following causes: They can require diverse levels of aggregation of the similar business and corporate data. They can engage intra-table contrasts (evaluating one or more rows in a table with additional rows in the same table). They can require an additional filtering step following the resulted data-set has been arranged at the business and corporate system (for example discovering the high level 5 and bottom 5 salespeople previous month). Though it is feasible to produce the required results through such SQL characteristics as inline views, self joins and user-defined utilities the consequential queries could be difficult to recognize and may produce incorrectly long times for implementation. http://etutorials.org/SQL/Mastering+Oracle+SQL/Chapter+14.+Advanced+Analytic+SQL/14.1+Analytic+SQL+Overview/# IMPACT OF PARALLEL COMPUTING TECHNOLOGIES ON DATA WAREHOUSING AND DATA MINING The complicated nature of data mining practices and the size of corporate data require performance that could be simply derived from very influential parallel computing systems and applications. Additionally, in case of data mining parallel processing produces consequential advantages by spreading the working load on numerous processors, minimizing system expenses without forfeiting outcome. In addition, for the data mining parallel computing technology offers huge performance enhancement to the data warehousing practice by allowing the users to divide a problem into different jobs those could be executed at the same time. Thus, recognizing the power of parallel computing in the data mining practices, a lot of individuals erroneously suppose that collecting a parallel data warehousing systems is a costly as well as complicated choice. As that can have been the case at once, Informix, Sun Microsystems and Torrent technology structures and systems have connected together to offer a powerful, user-friendly, inexpensive, extendable, huge performance parallel data warehousing solution (Sun Microsystems, Inc., 1997) and (Garcia-Molina et al., 1998). Businesses ready to make a commitment with parallel data warehousing should effectively recognize the benefits of selecting SMP (Sun’s Symmetric Multiprocessing) Ultra Enterprise Server family because of their computing platform capabilities. In addition, the outcome of more than 10 years of improvement in close collaboration with its business and technical clients, Sun’s symmetric multiprocessing arrangement offers the characteristics necessary for outstanding data warehousing development projects, comprising exceptional flexibility, extensibility and complete binary compatibility all through an extensive product line (Sun Microsystems, Inc., 1997) and (Garcia-Molina et al., 1998). The scale and high aspects of business data-sets normally accessible as input to the issues of association rule creation, makes it a perfect issue for resolving numerous processors autonomously. In this scenario, the main issues are the memory along with CPU speed restrictions exercised with the help of one processor. Therefore it is serious to plan for well-organized parallel algorithms to perform the job. In addition, one more issue for parallel algorithm emerges from the reality that a lot of transaction databases are previously obtainable in parallel databases or they are dispersed at numerous locations to start with. In this scenario, the expense of joining them all at the one place or one computer system for sequentially searching association rules could be preventively much costly. In case of highly computation demanding systems, parallelization is a noticeable way for enhancing performance and attaining the capability to adjust its configuration. In addition, various techniques can be utilized to allocate the workload concerned in data mining over numerous processors (Paul, 2011), (Sun Microsystems, Inc., 1997) and (Garcia-Molina et al., 1998). In case of parallel computing technologies on data warehousing and data mining task-parallel algorithms allocate segments of the search space to divide processors. In this scenario, the task parallel techniques could be further divided into two clusters. Additionally, the first group is designed on a “divide and conquer” approach that separates the search space and allocates every data-set partition to a particular processor. The second group is designed on a job file that actively allocates little parts of the search space toward a processor each time it turns out to be accessible. In addition, for a data-warehouse task parallel application of decision tree generation will establish jobs’ connections to the divisions of a tree. In this scenario a Divide and Conquer technique brings an expected indication of the persistent type of decision trees (Paul, 2011), (Sun Microsystems, Inc., 1997) and (Garcia-Molina et al., 1998). DATA MINING APPLICATION TO MARKETING In this section I will discuss the data mining application in the field of marketing for the assessment of its performance and potential enhancements established through new technology based systems. At the present, computer technologies and marketing offer immeasurable potential for storing and gathering data from surveys, interviews and other means. This could be valuable information which has the potential to facilitate a business to augment its Return on Investment (ROI), enhance CRM (customer relationships management), minimize marketing promotional expenses, etc. For instance, to be successful, businesses have to be realistic and anticipate what a client requires. In this scenario, clients profiling offers the foundation for initializing what vendors call a "conversation" with clients. In addition, positioning customer’s groups allows the businesses to improve response for undeviating marketing promotions, targeting an undeviating marketing promotion to related persons. However it is possible to handle large volumes of information and data, gathered in databases and swiftly react to client’s requirements and demands (Smirnov-M, 2007) and (Exforsys Inc., 2006). For mining customers’ data to profile customer’s habits, organizations can utilize their present customer’s results or database of meetings/assessments. In addition, it could provide information regarding their leisure pursuit, buying behavior, daily requirements, and certainly, more private data and information such gender, age, personal earnings, matrimonial status etc. Moreover, these details and information can be analyzed through data mining applications. In this scenario, the most excellent method to store corporate clients or consumer data is large size structured databases of corporation. This will facilitate in utilizing the most significant profile factors all through the data mining procedure. For instance, an organization might be paying attention in targeting its business for a certain sex and age, in this scenario the database should hold similar details. Furthermore, the database should hold the required information, for instance: is customer concerned in our services or products (no/yes/don't know regarding the service or product/not sure, etc), customer’s feelings to the certain type of publicity we are intending to utilize (positive/negative/neutral) (Smirnov-M, 2007) and (Exforsys Inc., 2006). After building the customers groups, it is significant to build and maintain a database that could keep data and information regarding all the agents of client’s clusters. In this scenario, data mining software will get an abstract of representations for each participating cluster. In addition, after mining the customers’ transactions and personal information data in the customer’s database, the new and hidden information could be utilized not simply for specified marketing promotions, however for rapid communication with every fresh customer of a business. Moreover, after getting some details and information regarding a new customer, an organization will be competent to categorize type of the customer, forecast his/her requirements as well as demands for shared consideration between business and client (Smirnov-M, 2007) and (Exforsys Inc., 2006). At the present, banking sector utilizes new database marketing methods to recognize the top and most valuable customers. In this way they can successfully target clients for mortgage promotions, to predict customer preservation as well as to follow consistent marketing operations. In addition, a lot of telephone businesses utilize data mining techniques for examining clients calling trends. Consequently, they are capable to advise the most excellent plan for every new customer from the extremely commencement of communication. Moreover, the customer database marketing analysis processes and methods are a necessity for maintaining the large volumes of data currently accessible. Furthermore, the data mining methods are not difficult to recognize and apply, particularly taking into account the valuable data on customer purchases, customers and activities patterns, as well as other precious information which can have a crucial influence on business returns and earnings (Smirnov-M, 2007) and(Exforsys Inc., 2006). Wal-Mart Data Mining Software This section presents a comprehensive analysis of data mining technology for Wal-Mart Business. In this scenario, Wal-Mart established its data mining software at production support infrastructure. Additionally, according to the claim of Neo-Vista Software, Inc. Wal-Mart Stores, Inc. has established NeoVista’s Decision Series (TM) system of built-in data-mining software into its renowned replacement and decision support arrangement. WalMart is the world’s biggest retailer by profits of over $104.4 billion for each year, is utilizing the decision series software to improve store items seasonality as well as design more precise analytical paradigms intended for its system supports replacement systems (Wal-Mart Stores, 2011). Neo-Vista made an agreement with WalMart to modify a decision-series data mining system; modifying the application to WalMart’s related needs and requirements and to balance WalMart’s present database arrangement. On the other hand, WalMart runs its business operations by detaining POS (point-of-sale) transaction data from all of its trade stores which is kept in a 24-terabyte Teradata-data-warehouse by NCR. In this scenario, data mining and its application of data warehouse is division of WalMart’s quest to carry what its client’s desire: that is called the “right item”, on the “right store", at the “right time” and at the “right price”. In this regard, the new NeoVista-decision-series is facilitating Wal-Mart to observe people objects for particular stores to decide seasonal sales customers’ demands profiles. Additionally, this extra level of comprehensive analysis is facilitating Wal-Mart business to make even improved knowledgeable business decisions and facilitate business with additional practice of corporate data warehouse (Wal-Mart Stores, 2011). The positive effect of Wal-Mart’s new technology based data mining systems have been extremely optimistic equally in the correctness of prediction as well as in form of minimized response time. Additionally, this new data mining based decision-series facilitated WalMart to find out that seasonality was present in the corporate transactions of fundamental staples; similar to mouthwash with cat and dog foodstuff. In thus scenario, acquiring this type of comprehensive intelligence regarding all of its product items will offer Wal-Mart with a leading competitive benefit. Thus, the result would be a progress in stock levels that will augment returns, diminish transport expenses, as well as increase the in-stock number; that will have a marvelous helpful influence on WalMart’s end product (Wal-Mart Stores, 2011). Conclusion Data mining is method of analyzing data from different perspectives and transforming it into valuable information (information that can be used to raise income, reduce expenditures, or both). This information can be used by the organizations in decision making. This report has presented a detailed overview of data mining. This report has also discussed various techniques of data mining. At the present, information has become a very important asset for almost every organization. In this scenario, the techniques of data mining could be very helpful in making best use of this information. This report has also discussed the case of Wal-Mart Business, which has adopted a data mining techniques in order to run its business smoothly. References Anissimov, M., 2011. What is Data Mining? [Online] Available at: http://www.wisegeek.com/what-is-data-mining.htm [Accessed 01 April 2011]. ASQL Group., 2007. Welcome to Analytic SQL Server Homepage. [Online] Available at: http://www.analyticsql.org/ [Accessed 28 March 2011]. Bodon, F., 2005. Apriori Class Reference. [Online] Available at: http://www.cs.bme.hu/~bodon/en/apriori/Documentation/html/classApriori.html [Accessed 30 March 2011]. Bradford, C., 2011. What Are the Different Types of Data Mining Techniques? [Online] Available at: http://www.wisegeek.com/what-are-the-different-types-of-data-mining-techniques.htm [Accessed 29 March 2011]. Chicago Business Intelligence Group, 2011. Data Mining Techniques. [Online] Available at: http://www.chicagobigroup.com/business_intelligence_white_papers/business_intelligence_data_mining_techniques.pdf [Accessed 30 March 2011]. eTutorials.org, 2011. 1.1 What Is SQL? [Online] Available at: http://etutorials.org/SQL/Mastering+Oracle+SQL/Chapter+1.+Introduction+to+SQL/1.1+What+Is+SQL/# [Accessed 27 March 2011]. Exforsys Inc., 2006. Data Mining - Data Mining Applications. [Online] Available at: http://www.exforsys.com/tutorials/data-mining/data-mining-applications.html [Accessed 29 March 2011]. Faber, V., 1994. Clustering and the Continuous k-Means Algorithm. Los Alamos Science, 22, pp.138-44. Frand, J., 1998. Data Mining: What is Data Mining? [Online] Available at: http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm [Accessed 01 April 2011]. Garcia-Molina, H., Wiener, W.J.L.J.L. & Zhuge, Y., 1998. Distributed and parallel computing issues in data warehousing. In PODC '98 Proceedings of the seventeenth annual ACM symposium on Principles of distributed computing., 1998. ACM New York, USA. Han, J. & Kamber, M., 2006. Data Mining: Concepts and Techniques. 2nd ed. BOSTON: Elsevier Inc. Hegland, M., 2005. The Apriori Algorithm a Tutorial. [Online] Available at: http://www2.ims.nus.edu.sg/preprints/2005-29.pdf [Accessed 31 March 2011]. Inmon, W.H. & Hackathorn, R.D., 1994. Using the Data Warehouse. New York: Wiley. Laudon, K.C. & Laudon, J.P., 1999. Management Information Systems, Sixth Edition. New Jersey: Prentice Hall. Leeser, M., 1999. Overview of K-Means Clustering. [Online] Available at: http://www.ece.neu.edu/groups/rpl/projects/kmeans/ [Accessed 29 March 2011]. Matteucci, M., 2010. A Tutorial on Clustering Algorithms. [Online] Available at: http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/kmeans.html [Accessed 29 March 2011]. Oleszkiewicz, B., 2008. [GENERAL] Analytic SQL Server - next generation analytic Data Warehouse with OLAP support. [Online] Available at: http://www.mail-archive.com/pgsql-general@postgresql.org/msg110420.html [Accessed 27 March 2011]. Paul, S., 2011. Parallel and Distributed Data Mining. [Online] Available at: http://www.intechopen.com/articles/show/title/parallel-and-distributed-data-mining [Accessed 29 March 2011]. Remote DBA, 2011. Oracle: Analytical SQL functions - rollup - cube. [Online] Available at: http://www.remote-dba.net/pl_sql/t_analytic_functions_oracle_rollup_cube.htm [Accessed 26 March 2011]. Smirnov-M, H., 2007. Data Mining and Marketing. [Online] Available at: http://www.estard.com/data_mining_marketing/data_mining_campaign.asp [Accessed 29 March 2011]. Spehr, J. & Winkelbach, S., 2011. The k-means algorithm. [Online] Available at: http://www.rob.cs.tu-bs.de/content/04-teaching/06-interactive/Kmeans/Kmeans.html [Accessed 28 March 2011]. Sun Microsystems, Inc., 1997. Parallel Data Warehousing —Assembling a Complete Solution. [Online] Available at: http://www.sun.com/third-party/dw/whitepapers/med-sun_informix.pdf [Accessed 29 March 2011]. Wal-Mart Stores, 2011. Wal-Mart Deploys Data Mining Software Into Its Production Support Environment. [Online] Available at: http://walmartstores.com/pressroom/news/4008.aspx [Accessed 29 March 2011]. Weisstein, E.W., 2011. K-Means Clustering Algorithm. [Online] Available at: http://mathworld.wolfram.com/K-MeansClusteringAlgorithm.html [Accessed 28 March 2011]. Zhang, Z., Zhang, J. & Xue, H., 2008. Improved K-Means Clustering Algorithm. 2008 Congress on Image and Signal Processing,5, pp.169-72. Read More

Important Data Mining Techniquesning - Essay Example

Extract of sample "Important Data Mining Techniquesning"

CHECK THESE SAMPLES OF Important Data Mining Techniquesning

Data Warehousing and Data Mining

Data Mining and Data Warehousing

Data Warehousing & Data Mining

Data Mining Process and Algorithms

Data Mining and Behavior of Customers

Data Warehousing and Data Mining

Data Mining

Data Mining Issues