StudentShare
Contact Us
Sign In / Sign Up for FREE
Search
Go to advanced search...
Free

Critical Evaluation of Potential Solutions - Essay Example

Cite this document
Summary
This paper 'Critical Evaluation of Potential Solutions' explores the typical problems encountered in the extract-transformation-load or ETL process and makes a critical evaluation of potential solutions to those problems.  The ETL process is identified with exactly that process of data extraction from various system sources…
Download full paper File format: .doc, available for editing
GRAB THE BEST PAPER98.3% of users find it useful
Critical Evaluation of Potential Solutions
Read Text Preview

Extract of sample "Critical Evaluation of Potential Solutions"

? ETL- Typical Problems, Critical Evaluation of Potential Solutions Table of Contents Introduction 3 ETL Overview 3 Typical ETL Problems 5 Evaluationof Potential Solutions 8 References 10 Introduction This paper explores the typical problems encountered in the extract-transformation-load or ETL process, and makes a critical evaluation of potential solutions to those problems. The ETL process is identified with exactly that process of data extraction from various system sources and then loading that data into a data warehouse for various storage and analytical processes. Data mining is a term associated with the process of extracting insights from the data residing in the warehouse, and the transformation step in the ETL process is a preparation of the raw data from the various source systems for loading into the data warehouse structures. A number of ETL tools are used precisely for this process, as a way to effect or to automate partially or in whole the whole process of extracting the data from systems and organizing them in an appropriate data warehouse structure. The idea is that traditional ETL processes have come to be characterized by rising costs, growing complexity, and a growing amount of data that needs processing and analyzing, whereas alternative technologies exist, such as Hadoop and process tweaks that involve undertaking many of the processes in the ETL stage in parallel, that need to be considered in earnest (Oracle 2005; Henschen 2012; Janssen 2013; Vassiliadis and Simitsis 2010; Earls 2012). ETL Overview The idea behind ETL is that this is not something that is removed from the day to day practices of enterprises of various forms and sizes, but is something that is a natural offshoot of companies wanting to find out more about their customers and themselves via an analysis of the data generated by internal systems. This fundamental desire to know more naturally leads to firms making use of devices and tools of various levels of sophistication to extract transactional data from existing systems, transform the data, and load the data into a database that can lend itself in turn to further analysis. In the simplified versions companies manually extract data from either manual or computerized systems and perform the transformation and loading into something simple like an Excel file. In larger companies this process can involve intensive programming to extract and prepare large amounts of data from existing systems for further analyses afterwards. Whether the source systems data are complex or simple, and whether the tools for the ETL process are simple or sophisticated, the basic premise is the same, springing from the same motivations and is a natural process in any enterprise (Oracle 2005; Henschen 2012; Janssen 2013; Vassiliadis and Simitsis 2010; Earls 2012). Moreover, from historical and process perspectives, the progress in the ETL tools and processes from even just twenty years ago is characterized by dynamism and rapid evolution of the related processes and tools to make the overall ETL process more productive, less reliant on the use of complex individual programming and processing from the source systems to the data warehouses, and more heavily focused on making the entire ETL process more efficient to allow for more time to be devoted to actually doing analytics on the transformed data. Moreover, the analytics tools themselves have come to be incorporated in comprehensive packages for data warehousing, encompassing the end to end process that starts with the ETL process and ends with iterative analytics and reporting on the data in the data warehouses. The table below summarizes the progress in the science and art of ETL over the past two decades (Oracle 2005; Henschen 2012; Janssen 2013; Vassiliadis and Simitsis 2010; Earls 2012): Table Comparing the ETL Process Two Decades Ago and at Present (2012) 20 Years Ago  Today • Custom, hand-coded • Almost always vendor-supplied • Focused narrowly on moving data • Usually part of a software suite focused on broader data handling and integration issues • Key aspect of data warehouse • Used for data warehouses, data migration, database sorts and joins, and other processes • Lived on-site • Could-based versions emerging as an option Table Source: Earls 2012 Typical ETL Problems There is an intense debate going on in IT trade circles with regard to the value and the ultimate fate of ETL processes and tools moving forward. Typical complaints about the ETL process include that they take too much time and money, requiring the attention and the intense work of an ever-growing pool of specialized talent, and all requiring the use of large amounts of finances in an activity that, according to some CIOs of large firms, do not add any real value to organizations apart from its use as an intermediary step in data analytics. The real value is in culling insights from the data, but the tools for doing the analytics work can only deal with data that has been transformed and prepared by the ETL process. From a business and process point of view therefore, the typical problems tied to the ETL process have to do with their growing complexity and the growing complexity and size of the data that the process deals with, and the need for businesses to find ways to manage the complexity, reduce costs, and get to the insights generation part of the process or the analytics part of the process as fast as possible. The problem here is that current ETL processes, the growth in all kinds of new data including online data, and the growth in complexity of backend systems that generate these data all contribute to ETL becoming more and more unwieldy and expensive as a process (Henschen 2012). Literature also exists that frames the question of typical problems associated with the ETL process in terms of the differences in performance of various ETL tools on a number of key metrics, including how fast and scalable the tools are; how versatile the tools are; how easy the tools are to use in terms of the user interface aspects of the ETL process and tools; and how much it costs to undertake the ETL process and to own the ETL tools. These metrics constitute the kinds of real-life problems that may be associated with the ETL process. For instance, scalability and speed are problems that relate to how fast and how well the ETL process is able to deal with increasing data loads. Versatility relates to how well the ETL process and tools are able to deal with complex systems and data, without needing large amounts of manual interventions and re-engineering of the core tools to make the whole ETL system work. Overall costs to own the tools and the process relate to the need for businesses to bring down these costs, and the problems are that at present the costs for ETL are escalating and going beyond what the companies are deeming to be low enough relative to the benefits to be had from the ETL process. Problems with user interfaces relate to the ease of use and the manageability of the complexity of the ETL process. These are fundamental problems and issues that different organizations encounter with varying levels of importance and degree, and all necessitating the earnest attention of the IT professionals that use them and the decision makers that make the purchase decisions for these tools and ETL processes/systems (Business Insight 2010). Another perspective on the problems tied to ETL processes looks at ETL as something that needs to be extended in terms of going beyond its traditional uses in extracting data from source systems and preparing and loading them for use for analytics purposes. The purposes are being extended to include the replication of existing data; the management of data for specific purposes; the use of ETL for real-time applications, that require morphing ETL from being a bulk batch process into a more real-time and continuous process; and the use of ETL in the context of the growing complexity of cloud applications. All these extensions present limitations in the current ETL processes and tools that need to be overcome with new processes and tools (Earls 2012). Related to these problems, and stated in another way, are problems relating to extending the ETL process for purposes that go beyond the offline uses of data in data warehouses, to uses that are tied to real-time applications, such as ETL on-demand, ETL for data streams, ETL that is close to real time and provides transformed and loaded data that is as recent as possible from the time of the transactions (Vassiliadis and Simitsis 2010, p. 6). Still another set of perspectives on the typical problems associated with ETL is tied to the formal characterization of the processes in ETL, that have implications for making them more efficient and more subject to rigorous optimization studies and analyses. Included here are problems associated with making the ETL process faster by optimizing the way several steps are run in parallel. Also included in this set is the problem of standardizing the ETL process so that tools and processes can be aligned with a standardized way of conceptualizing and executing ETL processes (Vassiliadis and Simitsis 2010, p. 6). Evaluation of Potential Solutions One potential solution lies in undertaking a basic shift in paradigm in thinking about undertaking the analytics of data. What is being proposed is doing away with the traditional division between systems that do the data generation, meaning the transactional systems on the one hand, and the systems that extract, transform, load and analyze data on the other hand. The bottlenecks and costs associated with ETL can be overcome and reduced if the systems are all optimized to undertake the end to end process from data generation on the transactions level all the way to the data analysis process at the end of the ETL intermediate steps. Technologies such as Hadoop are said to be steps towards changing the systems so that they reflect this shift in paradigm. In the new Hadoop processes, the extraction is the only process where there is an overlap with traditional ETL processes. Once extracted, data is loaded into Hadoop environments, where the transformation, loading, and analytics are undertaken together with a host of related analytics tools and processes. Once inside Hadoop the need to port data out of the environment for specific purposes that are not accommodated by the available tools are handled as exceptions and are efficiently managed, reducing times for analytics and for special processes inside the data warehouse, so to speak, to fractions of what they were in non-Hadoop environments. This new environment is one where the data is “consumed” or analyzed for insights in the same place where it is initially loaded after extraction, and before the traditional transformation. The data does not leave to leave Hadoop once there, and the analytics and related processes can be undertaken and redone within. Evaluating this solution, there is merit in the proposal, even as the extraction process itself remains what it was as in the traditional ETL process, and the systems for data generation still exist independent of the Hadoop environment where the analytics takes place. On the whole though, this is a step in the direction of the new paradigm described above, because it integrates many of the steps after the extraction stage (Henschen 2012). Other solutions tied to improving the efficiency of the ETL process, to reduce time to completion, include making use of efficient tools such as SAS, running processes in parallel with more computing resources and more personnel resources, and making use of more intuitive and friendly user interfaces for the ETL tools to reduce learning curves and improve efficiency. These potential solutions may work, but they do reduce times for process completion at the cost of additional financial investments in tools, computing resources, and personnel (Business Insight 2010). Another set of proposals tied to overcoming the traditional problems of ETL described above is to evolve the tools and the processes precisely to handle the growth in data volumes and complexity, and to be able to deal with changing needs such as real-time ETL and ETL for streaming data, among others. This is workable, and can be undertaken in conjunction with moves to integrate the analytics and ETL process into the systems that generate the data (Vassiliadis and Simitsis 2010, p. 6; Business Insight 2010; Henschen 2012). References Business Insight 2010. Data Integration Tools: ETL Tools. Business-Insight.com. http://www.business-insight.com/html/intelligence/bi_ETL.html [Accessed 27 October 2013] Earls, Alan. 2012. The State of ETL: Extract, Transform and Load Technology. Data Informed. http://data-informed.com/the-state-of-etl-extract-transform-and-load-technology/ [Accessed 27 October 2013] Henschen, Doug. 2012. Big Data Debate: End Near for ETL? Information Week. http://www.informationweek.com/big-data/news/big-data-analytics/big-data-debate-end-near-for-etl/240143068 [Accessed 27 October 2013] Janssen, Cory. 2013. Extract Transform Load (ETL). Techopedia. http://www.techopedia.com/definition/24170/extract-transform-load-etl [Accessed 27 October 2013] Oracle. 2005. 11 Overview of Extraction, Transformation and Loading. Oracle Database Data Warehousing Guide. http://docs.oracle.com/cd/B19306_01/server.102/b14223/ettover.htm [Accessed 27 October 2013] Vassiliadis, Panos and Alkis Simitsis. 2010. Extraction, Transformation and Loading. IBM Almaden Research Center, University of Ioannina Department of Computer Science & Engineering. http://www.cs.uoi.gr/en/index.php?menu=m1 [Accessed 27 October 2013] Read More
Cite this document
  • APA
  • MLA
  • CHICAGO
(“ETL (extraction/transformation/load) forms a crucial part in Essay”, n.d.)
ETL (extraction/transformation/load) forms a crucial part in Essay. Retrieved from https://studentshare.org/information-technology/1488977-etl-extraction-transformation-load-forms-a-crucial
(ETL (extraction/transformation/Load) Forms a Crucial Part in Essay)
ETL (extraction/transformation/Load) Forms a Crucial Part in Essay. https://studentshare.org/information-technology/1488977-etl-extraction-transformation-load-forms-a-crucial.
“ETL (extraction/transformation/Load) Forms a Crucial Part in Essay”, n.d. https://studentshare.org/information-technology/1488977-etl-extraction-transformation-load-forms-a-crucial.
  • Cited: 0 times

CHECK THESE SAMPLES OF Critical Evaluation of Potential Solutions

Process for BAE Company in Saudi Arabia

Such skills include evaluation of the problem definition, knowing of when to listen or confront, being a good relation builder, and recognition of excessive dependency or understanding dynamics of resistance.... Failure of going through this step leads to possible underestimation or overestimation of the potential of the consultant to be useful.... These include pre-entry, entry, information gathering, solution searching, evaluation and termination (The Consultancy Process, 1990)....
15 Pages (3750 words) Essay

PSDM Model of Interpersonal Conflict Resolution

According to the model, conflict resolution entails four phases that include diagnosis phase, identification of alternative solutions, evaluation and selection of acceptable alternative and finally decision making and implementation phase.... The second stage is identification of alternative solutions through brainstorming.... The third phase is evaluation and selection of an acceptable solution (Rahim, 2011)....
7 Pages (1750 words) Essay

Solution to Lampedusa Tragedy

The author of this essay entitled "Solution to Lampedusa Tragedy" casts light on the burning problem of the refugee crisis.... It is mentioned here that war, civil unrest, famine, drought and disease outbreaks are forcing people to flee to other places....  … Problems are and have always been part of humans since their existence....
7 Pages (1750 words) Essay

Applied Problem Solving

This essay talks about personal analogy which uses absurd solutions in the organization to develop into more realistic perspectives.... The most common is excursion which is similarly introduced as a fantasy.... In this portfolio the author picks the key word “decision making”....
26 Pages (6500 words) Essay

Maternity Clothes Forever

evaluation of potential solutionsA.... Perform a SWOT analysis to determine a balance of potential risks, opportunities, competition in market, and possibility of future expansion in market.... evaluation of OutcomeA.... Based on proposed strategies, develop viable solutions to problems of product development.... These solutions require a deep understanding of market for newly developed products.... Determine whether new… Research on strengths of new products in market by evaluating customer needs and potential for sale of new products. B....
1 Pages (250 words) Essay

Effective Application of Critical Thinking as a Leader

The company lacks critical thinkers who can combine dynamism and efficiency with analytical sharpness to harness its potential.... Success in critical thinking demands that leaders make reasoned decisions and adopt holistic perspectives.... It also involves solving problems, deducing and inferring… critical thinking is also the art of actively and technically conceptualizing, implementing, analyzing, processing and/or evaluating information generated through experience, reasoning, observation or communication....
7 Pages (1750 words) Essay

Underachievement in Gifted and Talented Children

Nevertheless, practitioners and scholars have to explore the causes of underachievement in order o come with succinct solutions.... hellip; y potential children with special needs may experience underachievement as efforts to tackle these needs may concentrate more on remediation of hardships and less on development of areas of talent and strength.... According to these studies, the most common factors associated with underachievement in talented children include the following; The determination on why some high potential students exhibit low levels of achievement is often challenging since underachievement is precipitated by varying reasons (Callahan, & Davis, 2012)....
6 Pages (1500 words) Essay

How Clinical Governance and Research Governance Can Help a Clinical Scientist

There are others who are involved in coming up with scientific solutions that help the patients.... oth Clinical Governance and Research Governance are critical projects that NHS has been focusing upon.... This paper "How Clinical Governance and Research Governance Can Help a Clinical Scientist" focuses on the fact that adequate and high-quality health care is very crucial in the modern world....
12 Pages (3000 words) Case Study
sponsored ads
We use cookies to create the best experience for you. Keep on browsing if you are OK with that, or find out how to manage cookies.
Contact Us