StudentShare
Contact Us
Sign In / Sign Up for FREE
Search
Go to advanced search...
Free

Data Warehousing and Data Mining - Research Paper Example

Cite this document
Summary
This research 'Data Warehousing and Data Mining' tells that data warehouses are primarily decision support systems and this functionality is achieved through data mining. Data mining or knowledge discovery is the most important task in data warehousing as far the usability of the system is concerned. …
Download full paper File format: .doc, available for editing
GRAB THE BEST PAPER92.7% of users find it useful
Data Warehousing and Data Mining
Read Text Preview

Extract of sample "Data Warehousing and Data Mining"

?Running Head: Data Warehousing and Data Mining Data Warehousing and Data Mining: A Research Paper goes here Professional Specialization Name of your professor Date Introduction We are living in an information age and states around the globe are advancing rapidly towards information driven knowledge base economies. The increased use of information technology in every spheres of life is generating huge amount of data every day. However, there is an increase realization among information scientists that knowledge and power that should have been extracted from these huge information resources is far lesser than its capacity and most of the data collected globally each day by billions of information systems is never used for the purposes it was collected. Moreover, the increased data redundancy is further exacerbating the situation and the conversion of data into information, information into knowledge and knowledge in to power is very slow. This redundant and dubious information resource is of no good for managers who have to take quick decisions. Managers require precise information that represents and accounts for every aspect of a business. It is the responsibility of a decision support system to answer any query related to information stored in the system and to generate some nontrivial information patterns. These patterns can impart the required business intelligence and can leverage certain decisions. Data Warehouse There is no consensus on the definition of a data warehouse. In simplest terms, a data warehouse is a set of multiple applications, concepts, methodologies, tool and techniques to gain some knowledge based on historical data that may come from multiple systems and sources to assist managers in decision-making process. Vercellis (2009) defines “A data warehouse is the foremost repository for the data available for developing business intelligence architectures and decision support systems.” However, it is not a comprehensive definition and Vercellis (2009) himself admits, “The term data warehousing indicates the whole set of interrelated activities involved in designing, implementing and using a data warehouse.” Characteristics of a Data Warehouse There are few important characteristics of a data warehouse. These characteristics define the efficiency and effectiveness of the system and determine its qualification being a data warehouse. Most important characteristic of a data warehouse is the strength of its repository, which depends on the availability of sufficient historical and current data. The exact amount of historical and current data is determined by the domain where the data warehouse is being deployed. Secondly, a data warehouse has to provide ad-hoc access to information sources. This means there are only fewer fixed SQL queries and most of the inferences and intelligence is gathered through dynamic, on-the-fly queries. A data warehouse employs several tools like data modeling, star schema, data mining etc. to ensure ad-hoc access to its resources. Thirdly, a data warehouse is designed for decision makers and knowledge workers. However, these people are not bond to be information technology experts. Because strategic decisions are more concerned with customer trends, behaviors and market forces knowledge workers are not interested in individual records of a customer, product or service rather these users require an all inclusive big picture that may help to make long term strategic decisions and short term operational decisions. How it is different? A data warehouse is essentially different from Online Transaction Process (OLTP) and Enterprise Resource Planning (ERP), Customer Resource Management (CRM) systems. Because these systems are not designed and engineered for decision-making and knowledge discovery, they do not have huge historical data. Secondly, they record live transactions of the business and keep records of customers, products and services updated. On the contrary, a data warehouse does not record live transactions; its repository is populated by multiple systems like OLTP, ERP, CRM, flat files and several other business information sources. Moreover, the data once entered in a data warehouse is never updated for any sort of changes that may occur in real world rather these changes are simply appended to existing information repository. Importance of Historical Data Warehousing is a system that extracts knowledge from underlying information resources through eradication of redundancy and thus lends power to decision makers. Data Warehouse is quite different from traditional Online Transaction Processing (OLTP) systems that record information for each business transaction. A data warehouse typically stores huge amount of historical data that may come from OLTPs, manual systems, web, customers and any other source, which may contain structured or unstructured information. Unlike OLTP a data warehouse does not contain volatile data, which mean data from a data warehouse in never removed or changed with any change of actual data. However, it may be archived due to internal storage limitations. However, it is not a serious issue anymore as Cuzzocrea & Umeshwar (2011, p.1), have pointed out, “Nowadays, storage capacity increased significantly at affordable prices.” The storage of historical data is an ultimate requirement of data warehouses (DWs) because it is utilized for knowledge discovery, formally known to be data mining (DM). DWs can retain all historical time stamped data, which helps to generate patterns of customer, product and employees’ behaviors. It is important to note that data warehouse is not a system that can be deployed in an environment rather it is a concept that shapes and evolves through integration of various tools and steps of data warehousing. More importantly, data warehouse is not a warehouse of data, which means that it is not a system of accumulating all historical data rather; it is a concept that evolves by integration of multiple systems, algorithms and concepts. A data warehouse unlike a software application or information system does not collect individual records rather, it consumes millions of historical records to search, learn and present some patterns that may help to understand the big picture of business operations. Ponniah (2010, p.4) observes, “The operational computer systems did provide information to run day-to-day operations but what the executives needed were different kind of information that could be used readily to make strategic decisions.” A data warehouse provides all-important strategic insight into a business that can help to understand the helm of affairs in correct perspective to decide future directions of the business. Data warehousing is getting more attention of corporate business and with the passage of time, technological and practical deviations from the traditional concepts and methodologies are being introduced. The size of the repository, time variance and data volatility are chief recipients of these changes in traditional approach. The size of the repository is now more commonly linked to the business domain where the data warehouse is to be deployed instead of the data warehouse itself. Beside industry, the cost of storage and value of stored data are the other major considerations as far the size of the repository is concerned. Secondly, the deviations are introduced in the traditional update intervals for the data warehouse. Traditionally, long intervals have resulted in lack of latest business updates in the pictures and analysis presented by the data warehouse. Depending on the industry, these long intervals are replaced with short intervals between data warehouse updates and in some cases, it is even reduced to transaction level, like an OLTP system. The availability of a traditional data warehouse is usually kept at twelve hours a day, which may be replaced with round-the-clock operations. The acceptance of these variants is at the rise because they give better value for expensive data warehouse implementations and heavy maintenance costs. Development of a Data Warehouse The development life cycle of a data warehouse is altogether different from that of a traditional information systems and OLTPs. Data warehouses life cycle model is known to be CLDS, which is simply the reverse of Software Development Lifecycle (SDLC). “The SDLC is found where there is repetitive operational processing. The CLDS is found where there is informational reporting. As the data warehouse came into existence, operational processing and information processing separated.” (DPD, 2010). The difference in life cycle is based on the difference of approach, an OLTP runs and records day-to-day business operations whereas a data warehouse is primarily suppose to find out ways in which the business can groom and the areas where changes are required. It is like upside down in case of a data warehouse development where implementation is carried out first and collection of user requirements is the last phase of development. Another major difference in development process is the de-normalization of relations in a data warehouse. This process is the reverse of normalization of database relations, which is carried out in the development of information systems. However, it must be noted that de-normalization does not mean disorder and chaos. It is a well-defined domain in software engineering, which requires implementation of predefined steps. Data warehouse use dimensional modeling to ensure knowledge discovery. De-normalization is necessary because it can make searching operations fast, which is mandatory in data warehousing where we need to manipulate billions of record to find out unknown patterns in few seconds from heaps of historical data. England & Gavin (2007, p.56) declares, “De-normalization is effective for a data warehouse schema.” It is important to note that data warehouse is primarily a Decision Support System (DSS) and star schema is considered the best choice in DSS. England & Gavin (2007, p.51) declares, “The most desirable result when modeling for a data warehouse using dimensions and facts is called a star schema.” Extract, Transform and Load In most of the cases, a data warehouse will not receive any direct data like OLTP and it will be populated using existing sources of data and information. The sources from where the data may come include, traditional information system, OLTP, flat file systems and even manual cards and register. The process of making this data acceptable for a data warehouse and then loading it is formally known to be Extract, Transform and Load (ETL) process. ETL process is the most important part of implementing a data warehouse because usually it consumes more fifty percent of the total effort. Gauging the role of ETL Khan (2010, p.37) discovers, “An extremely important and difficult task in building a data warehouse is extracting, transforming and loading a huge amount of data that is stored a variety of disparate legacy systems. The ETL design and development task is also very expensive and can easily consume 50-75% of data warehouse project cost.” ETL is a complex process, which begins with the extraction of information from a host of source systems. The extracted data may be in variety of formats and need to be standardized before it can be uploaded into a data warehouse. This process of transformation makes the data consistent and cleanses it for various errors. Finally, the transformed data is loaded in to the data warehouse, which may include indexing and calculations. Typical Uses of Data Warehouses Data warehousing is growing leaps and bounds and various industries are implementing different flavors of the data warehousing. The data warehousing can be employed in any industry where the analytical reports can make a difference. It can be used to discover target markets and customers along with their peculiar needs and requirements to segregate them into particular markets and customer groups. A data warehouse can also be deployed to detect frauds and pattern of use of a service or product to find out best mix of services and products based on trend analysis. It can help to make bundle offers and design promotions, variable pricing of product and services so on and so forth. Industries and sectors where data warehousing can deliver tremendous results include financial sector, telecommunication industry, transportation industry, agriculture sector etc. Data Mining Data warehouses are primarily decision support systems and this functionality is achieved through data mining. Data mining or knowledge discovery is the most important task in data warehousing as far the usability of the system is concerned. Data mining is not a simple querying process. It is a technique to uncover something, which is neither queried nor apparent in the data. Data mining extracts enormous amount of information using patterns, trends and other known and unknown business facts. It is quite tactful job in data warehousing. Data mining uses several algorithms, like clustering, classification, regression analysis etc. to discover knowledge. These algorithms test various hypotheses to ascertain the presence or absence of a business fact or opportunity. More importantly, data mining fetch out business facts that are not actually asked by a knowledge worker or decision maker. Data mining surfaces facts, which are buried in the depth of information heaps. Data mining may bring interesting patterns of customer behavior, product and service trends that can further be explored to discover all important business intelligence. References Cuzzocrea, A., Umeshwar D. (Eds.). (2011). Proceedings from 13th International Conference on Data Warehousing and Knowledge Discovery. France: Springer. DPD, (2010). Data Processing Digest. (Vol 44, Issues 1-11), USA: The University of Michigan. England, K., Gavin, P. (2007). Microsoft SQL Server 2005 Performance Optimization and Tuning Handbook. USA: Digital Press. Khan, A. (2003). Data Warehousing 101: Concepts and Implementation. USA: iUniverse. Ponniah P. (2010). Data Warehousing Fundamentals for IT Professionals. USA: John Wiley & Sons. Vercellis, C. (2011). Business Intelligence: Data Mining and Optimization for Decision Making. UK: John Wiley and Sons. Read More
Cite this document
  • APA
  • MLA
  • CHICAGO
(“Data Warehousing and data mining Research Paper”, n.d.)
Retrieved de https://studentshare.org/information-technology/1392752-data-warehousing-and-data-mining
(Data Warehousing and Data Mining Research Paper)
https://studentshare.org/information-technology/1392752-data-warehousing-and-data-mining.
“Data Warehousing and Data Mining Research Paper”, n.d. https://studentshare.org/information-technology/1392752-data-warehousing-and-data-mining.
  • Cited: 0 times

CHECK THESE SAMPLES OF Data Warehousing and Data Mining

Data Warehousing & Data Mining

Data Warehousing and Data Mining are critical aspect of modern healthcare practices.... Data Warehousing and Data Mining are critical aspect of modern healthcare practices.... data mining (DM) is a process that aims to use existing data to uncover new relationships unknown thorough common analysis practices.... data mining is the process of analyzing extensive data with the aim of establishing correlation between different variables....
3 Pages (750 words) Assignment

Changing Business Environment and Future Growth of Tourism and Hospitality

The paper "Changing Business Environment and Future Growth of Tourism and Hospitality" states that with variable guest requirements and increasing competition, information and communication technologies have emerged as one of the important themes impacting change in tourism and hospitality industry....
4 Pages (1000 words) Literature review

Key Performance Indicators to Evaluate the Success of Business Intelligence

In “Business Intelligence: The Savvy Manager's Guide” David Loshin describes the basic architectural components of a business intelligence environment, ranging from traditional topics such as business process modeling, data modeling, and more modern topics such as business rule systems, data profiling, information compliance and data quality, data warehousing, and data mining.... David Loshin has described Business Intelligence on the basis of data Models, data Standards....
6 Pages (1500 words) Literature review

Database System Specifications in Mayo Clinic Medical Center

data and information are the most valuable resources for any business or corporation.... The working of this business is traditional in nature and relying upon the manual ways of data and information handling.... At present, the data of the business is paper-based or semi-automated which is stored in MS-Excel sheets....
8 Pages (2000 words) Case Study

Data Warehouse and Data Mining in Business

The most effective is data mining and various tools are used to facilitate it.... Patterns are developed using data mining where graphic visualization combine with statistical analysis and refinement to produce the desired market tend or activity.... The doubling and increase amount of data lay basis of innovation and need for electronic storage.... This has prompted system manager to develop systems that are reliable and secure, this forms of systems are… Different levels of data are represented differently....
6 Pages (1500 words) Essay

Best Buy-Business: Problems and Technological Solutions

These technological solutions are the use of an ERP system or the enhancement of the existing ERP system to make the supply chain more efficient; and the use of data analytics to better understand buyer behavior and preferences (Crosby, 2014; McIntyre, 2014; Zacks Equity Research, 2014; Google, 2014)....
11 Pages (2750 words) Essay

Structures of a Database vs Data Warehouse

Applications of Data Warehouses And Data Mining Data Warehousing and Data Mining has picked up enhanced ubiquity in various territories of business to analyze the extensive databases rapidly which would be excessively unpredictable and tedious.... A Practical Guide to data mining for Business and Industry.... data mining Applications for Empowering Knowledge Societies.... On the other side, a data warehouse is customized for The Structures of A Database And A Data Warehouse Differences between the Structure of a Relational Database and data Warehouse....
2 Pages (500 words) Assignment

Business Analytics

The concept acts like a central repository where data from multiple sources are accumulated and then… In this context, every functional department of an organization can upload their recent annual or quarterly basis data or can even analyse the existing data in case an organization is subjected to any form of operational issues or undertakes a The concept also encompasses multiple other functional aspects such as data mining and process optimization (Kearney, n.... Enterprise data warehousing (EDW) in the modern marketing scenario has become a credential element that assures appropriate survival and development of every business process....
16 Pages (4000 words) Assignment
sponsored ads
We use cookies to create the best experience for you. Keep on browsing if you are OK with that, or find out how to manage cookies.
Contact Us