W5 asig ETL and Data Warehousing - Essay Example

Comments (0) Cite this document
American Airlines is reported to be one of the world’s largest carriers with historical roots traced in the 1920s (American Airlines, 2011). It is currently headquartered in Fort Worth,…
Download full paperFile format: .doc, available for editing
GRAB THE BEST PAPER94.7% of users find it useful
W5 asig ETL and Data Warehousing
Read TextPreview

Extract of sample "W5 asig ETL and Data Warehousing"

ETL and Data Warehousing al Affiliation ETL and Data Warehousing Select a company from the United s. Briefly describe the company’s business and its existing or planned data warehouse environment.
The company selected to comply with the requirements of the paper is American Airlines. American Airlines is reported to be one of the world’s largest carriers with historical roots traced in the 1920s (American Airlines, 2011). It is currently headquartered in Fort Worth, Texas and was noted to be “one of the largest scheduled air freight carriers in the world, providing a wide range of freight and mail services onboard Americans passenger fleet” (American Airlines, n.d., p. 1). The company’s data warehouse requirements are being served by Sybase (Sybase, 2014). Accordingly, the need for locating an appropriate ETL provider was the aim of increasing “revenue by reducing fraudulent ticket processing. That meant finding a way to quickly and efficiently query their data warehouse. That meant Sybase” (Sybase, 2014, p. 1). In addition, the data warehouse system of American Airlines through Sybase enabled the provision of the following results: “detect fraudulent ticket-processing, track ticket sales properly and ensure proper revenue is flowing into the company” (Sybase, 2014, p. 1).
Research the leading ETL products available in the marketplace and write a comparison of their major features.
The leading ETL products available in the market are diverse. The features are traditionally compared according to the following:
“√ Infrastructure √ Functionality √ Usability
√ Platforms supported √ Debugging facilities √ Data Quality / profiling
√ Performance √ Future prospects √ Re-usability
√ Scalability √ Batch vs Real-time √ Native connectivity”
(Passionned Group, 2014, p. 1)
The comparison of results is presented in a tabular format so that the individual product features are easily compared across vendors.
5.0 InformaticaPowerCenter
5.1Ascential Data Stage XE
11.5 BODI
Client and Server Architecture
Client Server Architecture
Clinet Server Architecture
Scalable and Extensible Technology
Highly scalable and extensible technology. Scale up as the data and load grows. Scales up w.r.t the hardware and software
Highly scalable Scales up w.r.t the hardware and software
Highly scalable Scales up w.r.t the hardware and software
Client Platform
Windows 2000/NT/98
Windows 95/NT/2000
Windows 95/NT/2000
Server Platforms
Sun Solaris, AIX, HP-UNIX, Windows NT/2000
Windows NT ( Intel and Alpha Platforms ), UNIX AIX, HP-UX, Sun Solaris, COMPAQ Tru64. Data Stage XE 390 works on OS/390 platform.
Sun Solaris, AIX, HP-UNIX, Windows NT/2000
Which DBMS are supported for extraction and loading
For Extraction: DB/2, DB/2 /400,Flat Files, IMS, Informix, MS SQL Server, MS Access, Oracle, Sybase, UDB, VSAM, ODBC, Others
Targets: Informix DB/2 /400,MS SQL Server, MS Access, Oracle, PeopleSoft Enterprise Performance Management (EPM), SAP® Business Information Warehouse (BW), Sybase, UDB, Flat Files, Others
QSAM: Sequential flat files ISAM: VSAM: KSDS, RSDS, ESDS - support GROUPS, multi-level arrays, REDEFINES, and all PICTURE clauses. DB2, Adabas, Oracle OCI ( For releases 7 and 8 ) , Sybase Open Client , Informix CLI , OLE/DB for Microsoft SQL Server 7, ODBC.
Generic ODBC, HP NeoView, IBM DB2/UDB, Informix IDS, Microsoft SQL Server, mySQL, Netezza, Teradata, Oracle, Sybase Adaptive Server Enterprise (ASE), Sybase IQ. Native bulk loading supported for all major databases.
Support for ERP Sources
Provides PowerConnect modules for connecting to PeopleSoft, Siebel and SAP R/3. Informatica is coming out with an Open PowerConnect API for rest of the ERP systems. So customers can write interfaces using this module.
DataStage XE provides full integration with leading enterprise applications including SAP, Siebel, and PeopleSoft. The DataStage Extract PACKs for SAP R/3, Siebel and PeopleSoft, and the DataStage Load PACK for SAP BW enable warehouse developers to integrate this data with the organizations other data sources.
JD Edwards OneWorld and World, Oracle e-Business Suite (EBS), PeopleTools, SAP BI and BW Server; SAP ERP and R/3 via ABAP, BAPI, and IDOC, Siebel,
Code Reusability capability within the product
Supports development of Mapplets which acts as library between Mappings and also can make transformations shareable across Mappings.
Permits the reuse of existing code through APIs thereby eliminating redundancy and retesting of established business rules
Supports Code reusablitiy, we can reuse Workflows, Dataflows, Tables,
Supports parallelism, one can run multiple mapping session on the same server.
Automatically distributes independent job flows across multiple CPU processes. This feature ensures the best use of available resources and speeds up overall processing time for the application.
Supports parallelsim, one can run multiple dataflows/work flows parallelly and run jobs parallelly
Code Generator
PowerCenter does not generate code, all the mappings developed will be inform of GUI interface.
Only DataStage XE/390 version automatically generates and optimizes native COBOL code and JCL scripts that run on the OS/390 mainframe.
DI Automatically generates an appropriate interface calls to access the data into the source systems. For most ERP applications DI generates optimized SQL for the specific target database systems (Oracle, DB2, SQL Server, and Informix)
Data Transformation Method (Engine Based ?)
PowerCenter is based on Hub & Spoke architecture and has inbuilt Transformation engine.
Transformation is engine based - column-to-column mappings
Tranformation is engine based.
Building & Managing Aggregates
Aggregation can be built using the built in transformation provided.
Enhances performance and reduces I/O with its built-in sorting and aggregation capabilities. The Sort and Aggregation stages of DataStage work directly on rows as they pass through the engine rather than depending on SQL and intermediate tables.
Aggregation can be built using the Query tranformation with the help of builit-n functions
Support for various data types
Supports most of the industry standard data types. This also depends on the kind of source system being used.
It supports most of the industry standard data types. It supports XML also.
It supports most of the industry standard data types. It supports XML also.
Data Quality Check functionality or feature
Does not have such feature. Needs to be handled prorgramatically.
Through Quality Manager it is possible to audit, monitor, and certify data quality at key points throughout the data integration lifecycle.
Supports Data quality check using different sets of transformations
Debugging and logging features
Does not a separate debugging Tool. The workaround is by setting the "verbose" property on each transformation. By this Informatica will create log files in the server, which can be used for further analysis.
Helps developers verify their code with a built-in debugger thereby increasing application reliability as well as reducing the amount of time developers spend fixing errors and bugs. Supports debugging on row-by-row basis using break points. DataStage immediately detects and corrects errors in logic or unexpected legacy data values using this. Highly useful for complex transformation, date conversions etc.
It does supports job execution in Debug mode
Exception Handling
Throws out the error records or rejected records into a log file
Supports exception handling.
Supports exception handling using Try, Catch blocks
How Tool Provides information about exception
Through log files stored in the server
Developers can closely observe the running jobs in the Monitor Window to provide run-time feedback on user-selected intervals. The powerful process viewer estimates rows-per-second and allows developers to pinpoint possible bottle-necks and/or points of failure. Using the Director, the developer can browse detailed log records as each step of a job completes. These date and time stamped log records include notes reported by the DataStage Server as well as messages returned by the operating environment or source and target database systems. DataStage highlights log records with colored icons (green for informational, yellow are warnings, red for fatal) for easy identification.
There are serveral exception categories available in DI. DI maintains 3 different logs (Trace, Error, Statistics) at the time of execution and these log files are stored in the server. Trace log displays information start & endi time of the job, workflow, dataflow. Statistics log window displays row count, path name, state of the DI object(job, wf,df, tranformations) and elapsed & absolute time. Error log displays the name of the object being executed, description & type of error occured. Monitor window displays job status with colored icons (green, red and yellow)
Restarting an aborted ETL process
Support restarting of the mappings
Restart is possible. Can restart from the point of failure.
A Data Integrator feature that allows you to run unsuccessful jobs in recovery mode. Can restart from the point of failure
Memory (Minimum/ Recommended) requirement at client machine
Minimum 128 MB
64 MB
Min 256 MB
Memory (Minimum/ Recommended) requirement at Server machine
Minimum is 256 MB and depends on the nature of the mappings. Each session of the PowerCenter takes around 8 MB of memory so according to the load the memory is needed.
Minimum 256 MB
Pentium processor with a minimum of 256 MB but recommended 512 MB
RAM and 100 MB free disk space (memory-intensive jobs require more
free disk space).
Repository Backup and Recovery
PowerCenter comes with good features for backup and recovery of the repository. This can done through Repository Manager.
Supports distributed Repository - Remote sites can subscribe to a set of meta data objects within the warehouse application. These sites are notified via email when meta data changes occur within their subscription. DataStage XE offers version control such as table definitions, transformation rules, and source/target column mappings within a 2-part numbering scheme.
You can also export an entire repository to a file. When you export or import a repository, jobs and their objects (created in Data Integrator) are automatically exported or imported as well.
Source: ETL Tool Comparison, 2009
ETL Tool Comparison. (2009, April). Retrieved November 15, 2014, from
American Airlines. (2011, November). History of AMR Corporation and American Airlines. Retrieved November 15, 2014, from
American Airlines. (n.d.). About American Airlines. Retrieved November 15, 2014, from
Passionned Group. (2014). ETL Tools Comparison. Retrieved November 15, 2014, from
Sybase. (2014). American Airlines. Retrieved November 15, 2014, from Read More
Cite this document
  • APA
  • MLA
(“W5 asig ETL and Data Warehousing Essay Example | Topics and Well Written Essays - 500 words”, n.d.)
W5 asig ETL and Data Warehousing Essay Example | Topics and Well Written Essays - 500 words. Retrieved from
(W5 Asig ETL and Data Warehousing Essay Example | Topics and Well Written Essays - 500 Words)
W5 Asig ETL and Data Warehousing Essay Example | Topics and Well Written Essays - 500 Words.
“W5 Asig ETL and Data Warehousing Essay Example | Topics and Well Written Essays - 500 Words”, n.d.
  • Cited: 0 times
Comments (0)
Click to create a comment or rate a document

CHECK THESE SAMPLES OF W5 asig ETL and Data Warehousing

Data Warehousing and data mining

...?Running Head: Data Warehousing and Data Mining Data Warehousing and Data Mining: A Research Paper goes here Professional Specialization Name of your professor Date Introduction We are living in an information age and states around the globe are advancing rapidly towards information driven knowledge base economies. The increased use of information technology in every spheres of life is generating huge amount of data every day. However, there is an increase realization among information scientists that knowledge and power that should have been extracted from these huge information resources is far lesser than its capacity...
10 Pages(2500 words)Research Paper

Data Warehousing and Mining

...? Data Warehousing and Mining By Table of Contents INTRODUCTION Data mining refers to the method of examining data from diverse viewpoints and transforming it into valuable information (information that can be used to raise income, reduce expenditures, or both). Additionally, data mining is also known as data or knowledge discovery. In addition, data mining uses a comparatively high computing power working on a massive collection of data to find out relationships and regularities between data points. Moreover, data mining uses a lot of techniques from machine learning,...
18 Pages(4500 words)Essay

Data Management: Data Warehousing and Data Mining

...?Information Systems have hitherto relied upon data quality and integrity for its management. Countless companies faced numerous problems in accomplishing this task but thankfully SAP has emerged as effective, efficient and user friendly tool that customizes software to meet the particular needs of the company. SAP solutions are getting recognition worldwide and it had highest analytical and performance management revenue in the year 2010 (Gartner, 2011) Among the various products SAP has launched one of the very affective tool for business analytic solutions is SAP BusinessObjects. Users can explore and compare relevant information through SAP BusinessObjects Explorer even if they lack in IT training as it provides a...
2 Pages(500 words)Research Paper

Data Warehousing & Data Mining

...?Data Warehousing & Data Mining a. Why are organizations moving toward implementation of data warehouses? What are the benefits of data warehouses? Data warehousing and data mining are critical aspect of modern healthcare practices. Data mining (DM) is a process that aims to use existing data to uncover new relationships unknown thorough common analysis practices. The process is similar to the extraction of valuable metal hence the term “mining” (Jackson, 2003). Data mining is the process of analyzing extensive data with the aim of...
3 Pages(750 words)Assignment

Data Warehousing and Data Mining

...? Full Paper Data Mining 3NF is usually recommended for a corporate environment managing massive amount of replicated data. For instance there is no requirement of saving data several times. However, there is a requirement of doing more joins. Comparatively, 1NF will provide the functionality of storing replicated data regardless of number of joins. It is the choice of database administrator to evaluate what is the right form; it may be 3NF or 1 NF. Moreover, normalization comprises of five rules that are applied on a relational database. The main objective is to eliminate or minimize the redundancy and at the same time increasing database efficiency. The negative part...
6 Pages(1500 words)Essay

Data Warehousing

...Introduction In the world today, many organizations are implementing advanced technologies that are aimed at improving the performance of the organizations. The importance of the technologies is to ensure that an organization achieves it goals and objectives within the stipulated period of time and in the appropriate manner. Data warehouses and business intelligence assists in transforming the raw data into information and knowledge that is essential for any decision making process. Part 1 Data warehousing refers to an area within a computer where data is stored in an organized and centralized way. It focuses on how well the data is...
10 Pages(2500 words)Essay

W5-Data Security Policy

...Running head: W5 DATA SECURITY POLICY Data Security Policy Affiliation May 2009 Table of Contents 3 Data security risk 4 Possible effects 4 Data Security Procedures 5 W5-Data Security 5 W5-Data Security policy 6 Conclusion 8 References 9 Abstract This paper presents the analysis of the W5-Data Security Policy. This paper comprises the detailed analysis and recommendations regarding the improvements to our department’s data security policy. Business and client information security and privacy are the leading concern in any organization....
5 Pages(1250 words)Essay

Data Warehousing in Healthcare

... Data Warehousing in Healthcare Outline I. Overview and objectives of the technology II. Manufacturers of the technology III. Operational requirements a. Hardware b. Software c. Network d. Information IV. The requirements for maintenance and support V. Important stakeholders VI. Business processes VII. Value/Benefits VIII. Opportunities for improving the technology a. Business b. Technical Declaration I pledge on my honor that I have not given or received any unauthorized assistance on this assignment/examination. I further pledge that I have not copied any material from a book, article, the Internet, or any other source except where I have expressly cited the source. Signature ________________ Date... of information to allow...
5 Pages(1250 words)Article

Data Warehousing

...Data Warehousing Introduction Data warehouse can be d as a database system through which several data can be analyzed and evaluated at a time. The concept of data warehouse has been introduced to extract, filter, classify and cluster data.. The actual purpose of developing data warehouse is to perform data analysis quickly and efficiently. In modern era, unprecedented advancement can be observed in the field of science and technology. Thus, in the current era people can be identified to have little time engaging in data processing unlike in the past decades. Notably, these advancements...
7 Pages(1750 words)Research Paper

W6 Asig HMO Information Delivery Framework and Data Warehousing

...HMO Information Delivery Framework and Data Warehousing al Affiliation HMO Information Delivery Framework and Data Warehousing What do you understand about the information delivery framework in the data warehouse environment? As learned from the discussion of the information delivery framework, the following visual illustration highlighted the relevant components as miners, explorers, farmers, tourists, and operators (Ponniah, 2010). Likewise, one also imbibed the four methods of information delivery as: reports, queries, analysis, and applications. There are varied information delivery tools depending on the users. From the illustration, the...
2 Pages(500 words)Essay
sponsored ads
We use cookies to create the best experience for you. Keep on browsing if you are OK with that, or find out how to manage cookies.

Let us find you another Essay on topic W5 asig ETL and Data Warehousing for FREE!

Contact Us