W5 asig ETL and Data Warehousing Essay Example | Topics and Well Written Essays

ETL and Data Warehousing al Affiliation ETL and Data Warehousing Select a company from the United s. Briefly describe the company’s business and its existing or planned data warehouse environment. The company selected to comply with the requirements of the paper is American Airlines. American Airlines is reported to be one of the world’s largest carriers with historical roots traced in the 1920s (American Airlines, 2011). It is currently headquartered in Fort Worth, Texas and was noted to be “one of the largest scheduled air freight carriers in the world, providing a wide range of freight and mail services onboard Americans passenger fleet” (American Airlines, n.d., p. 1). The company’s data warehouse requirements are being served by Sybase (Sybase, 2014).

Accordingly, the need for locating an appropriate ETL provider was the aim of increasing “revenue by reducing fraudulent ticket processing. That meant finding a way to quickly and efficiently query their data warehouse. That meant Sybase” (Sybase, 2014, p. 1). In addition, the data warehouse system of American Airlines through Sybase enabled the provision of the following results: “detect fraudulent ticket-processing, track ticket sales properly and ensure proper revenue is flowing into the company” (Sybase, 2014, p. 1).Research the leading ETL products available in the marketplace and write a comparison of their major features.

The leading ETL products available in the market are diverse. The features are traditionally compared according to the following: “√ Infrastructure √ Functionality √ Usability√ Platforms supported √ Debugging facilities √ Data Quality / profiling√ Performance √ Future prospects √ Re-usability√ Scalability √ Batch vs Real-time √ Native connectivity”(Passionned Group, 2014, p. 1)The comparison of results is presented in a tabular format so that the individual product features are easily compared across vendors.

Version----> 5.0 InformaticaPowerCenter5.1Ascential Data Stage XE11.5 BODIArchitectureArchitectureClient and Server ArchitectureClient Server ArchitectureClinet Server ArchitectureScalable and Extensible TechnologyHighly scalable and extensible technology. Scale up as the data and load grows. Scales up w.r.t the hardware and softwareHighly scalable Scales up w.r.t the hardware and softwareHighly scalable Scales up w.r.t the hardware and softwareClient PlatformWindows 2000/NT/98Windows 95/NT/2000Windows 95/NT/2000Server PlatformsSun Solaris, AIX, HP-UNIX, Windows NT/2000Windows NT ( Intel and Alpha Platforms ), UNIX AIX, HP-UX, Sun Solaris, COMPAQ Tru64.

Data Stage XE 390 works on OS/390 platform.Sun Solaris, AIX, HP-UNIX, Windows NT/2000Which DBMS are supported for extraction and loadingFor Extraction: DB/2, DB/2 /400,Flat Files, IMS, Informix, MS SQL Server, MS Access, Oracle, Sybase, UDB, VSAM, ODBC, OthersTargets: Informix DB/2 /400,MS SQL Server, MS Access, Oracle, PeopleSoft Enterprise Performance Management (EPM), SAP® Business Information Warehouse (BW), Sybase, UDB, Flat Files, OthersQSAM: Sequential flat files ISAM: VSAM: KSDS, RSDS, ESDS - support GROUPS, multi-level arrays, REDEFINES, and all PICTURE clauses.

DB2, Adabas, Oracle OCI ( For releases 7 and 8 ) , Sybase Open Client , Informix CLI , OLE/DB for Microsoft SQL Server 7, ODBC.Generic ODBC, HP NeoView, IBM DB2/UDB, Informix IDS, Microsoft SQL Server, mySQL, Netezza, Teradata, Oracle, Sybase Adaptive Server Enterprise (ASE), Sybase IQ. Native bulk loading supported for all major databases.Support for ERP SourcesProvides PowerConnect modules for connecting to PeopleSoft, Siebel and SAP R/3. Informatica is coming out with an Open PowerConnect API for rest of the ERP systems.

So customers can write interfaces using this module.DataStage XE provides full integration with leading enterprise applications including SAP, Siebel, and PeopleSoft. The DataStage Extract PACKs for SAP R/3, Siebel and PeopleSoft, and the DataStage Load PACK for SAP BW enable warehouse developers to integrate this data with the organizations other data sources.JD Edwards OneWorld and World, Oracle e-Business Suite (EBS), PeopleTools, SAP BI and BW Server; SAP ERP and R/3 via ABAP, BAPI, and IDOC, Siebel, SalesForce.

comCode Reusability capability within the productSupports development of Mapplets which acts as library between Mappings and also can make transformations shareable across Mappings.Permits the reuse of existing code through APIs thereby eliminating redundancy and retesting of established business rulesSupports Code reusablitiy, we can reuse Workflows, Dataflows, Tables, Parallelism Supports parallelism, one can run multiple mapping session on the same server.Automatically distributes independent job flows across multiple CPU processes.

This feature ensures the best use of available resources and speeds up overall processing time for the application.Supports parallelsim, one can run multiple dataflows/work flows parallelly and run jobs parallellyCode GeneratorPowerCenter does not generate code, all the mappings developed will be inform of GUI interface.Only DataStage XE/390 version automatically generates and optimizes native COBOL code and JCL scripts that run on the OS/390 mainframe.DI Automatically generates an appropriate interface calls to access the data into the source systems.

For most ERP applications DI generates optimized SQL for the specific target database systems (Oracle, DB2, SQL Server, and Informix)Data Transformation Method (Engine Based ?)PowerCenter is based on Hub & Spoke architecture and has inbuilt Transformation engine.Transformation is engine based - column-to-column mappingsTranformation is engine based.Building & Managing AggregatesAggregation can be built using the built in transformation provided.Enhances performance and reduces I/O with its built-in sorting and aggregation capabilities.

The Sort and Aggregation stages of DataStage work directly on rows as they pass through the engine rather than depending on SQL and intermediate tables.Aggregation can be built using the Query tranformation with the help of builit-n functions Support for various data typesSupports most of the industry standard data types. This also depends on the kind of source system being used.It supports most of the industry standard data types. It supports XML also.It supports most of the industry standard data types.

It supports XML also.Data Quality Check functionality or featureDoes not have such feature. Needs to be handled prorgramatically.Through Quality Manager it is possible to audit, monitor, and certify data quality at key points throughout the data integration lifecycle.Supports Data quality check using different sets of transformationsDebugging and logging featuresDoes not a separate debugging Tool. The workaround is by setting the "verbose" property on each transformation. By this Informatica will create log files in the server, which can be used for further analysis.

Helps developers verify their code with a built-in debugger thereby increasing application reliability as well as reducing the amount of time developers spend fixing errors and bugs. Supports debugging on row-by-row basis using break points. DataStage immediately detects and corrects errors in logic or unexpected legacy data values using this. Highly useful for complex transformation, date conversions etc.It does supports job execution in Debug modeException HandlingThrows out the error records or rejected records into a log fileSupports exception handling.

Supports exception handling using Try, Catch blocksHow Tool Provides information about exceptionThrough log files stored in the serverDevelopers can closely observe the running jobs in the Monitor Window to provide run-time feedback on user-selected intervals. The powerful process viewer estimates rows-per-second and allows developers to pinpoint possible bottle-necks and/or points of failure. Using the Director, the developer can browse detailed log records as each step of a job completes. These date and time stamped log records include notes reported by the DataStage Server as well as messages returned by the operating environment or source and target database systems.

DataStage highlights log records with colored icons (green for informational, yellow are warnings, red for fatal) for easy identification.There are serveral exception categories available in DI. DI maintains 3 different logs (Trace, Error, Statistics) at the time of execution and these log files are stored in the server. Trace log displays information start & endi time of the job, workflow, dataflow. Statistics log window displays row count, path name, state of the DI object(job, wf,df, tranformations) and elapsed & absolute time.

Error log displays the name of the object being executed, description & type of error occured. Monitor window displays job status with colored icons (green, red and yellow)Restarting an aborted ETL processSupport restarting of the mappingsRestart is possible. Can restart from the point of failure.A Data Integrator feature that allows you to run unsuccessful jobs in recovery mode. Can restart from the point of failureMemory (Minimum/ Recommended) requirement at client machineMinimum 128 MB64 MBMin 256 MBMemory (Minimum/ Recommended) requirement at Server machineMinimum is 256 MB and depends on the nature of the mappings.

Each session of the PowerCenter takes around 8 MB of memory so according to the load the memory is needed.Minimum 256 MBPentium processor with a minimum of 256 MB but recommended 512 MBRAM and 100 MB free disk space (memory-intensive jobs require morefree disk space).Repository Backup and RecoveryPowerCenter comes with good features for backup and recovery of the repository. This can done through Repository Manager.Supports distributed Repository - Remote sites can subscribe to a set of meta data objects within the warehouse application.

These sites are notified via email when meta data changes occur within their subscription. DataStage XE offers version control such as table definitions, transformation rules, and source/target column mappings within a 2-part numbering scheme.You can also export an entire repository to a file. When you export or import a repository, jobs and their objects (created in Data Integrator) are automatically exported or imported as well. Source: ETL Tool Comparison, 2009ReferencesETL Tool Comparison.

(2009, April). Retrieved November 15, 2014, from dwhnotes.com: dwhnotes.com/wp-content/uploads/2009/04/etl_tool-comparision.xlsAmerican Airlines. (2011, November). History of AMR Corporation and American Airlines. Retrieved November 15, 2014, from aa.com: http://www.aa.com/i18n/amrcorp/corporateInformation/facts/history.jspAmerican Airlines. (n.d.). About American Airlines. Retrieved November 15, 2014, from aa.com: http://www.aa.com/i18n/aboutUs/corporateResponsibility/company/profile-and-fast-facts.

jspPassionned Group. (2014). ETL Tools Comparison. Retrieved November 15, 2014, from etltool.com: http://www.etltool.com/etl-tools-comparison/Sybase. (2014). American Airlines. Retrieved November 15, 2014, from sybase.com: http://www.sybase.com/detail?id=210272

W5 asig ETL and Data Warehousing - Essay Example

Extract of sample "W5 asig ETL and Data Warehousing"

CHECK THESE SAMPLES OF W5 asig ETL and Data Warehousing

Internal Analysis for Costco in the US

Global Transportation Providers APL; OHL; COSCO

Operations Management Plan for a Company

Detailed Analysis of the Market Conditions for the Happy Christmas Company

Commerce in Arab Countries

To be Launched in USA

The Transformation of Physical Distribution in Retailing of TESCO

The Relation to Business