Divisions and Categories of Business Intelligence Technologies Assignment

Business Intelligence Technologies COMPONENT 1 Compare and contrast the process of Knowledge Discovery from Databases (KDD) with that of OLAP. [20 marks] Knowledge Discovery Database involves the discovery of novel patterns from massive sets of data by intersecting artificial intelligence with database systems. It aims to extract data in a human and understandable structure. The actual task revolves around the habitual or semi-automatic analysis of huge quantities of data as well as extracting earlier unknown but interesting patterns, for example, groups of data, extraordinary records and dependencies. The tasks normally involve the use of specialized database techniques such as spatial indexes (Ling Liu & Tamer 2009). In addition to that, the patterns are viewed as summaries of the input data, hence used in further analysis or in predictive analytics and machine learning. For instance, it may identify manifold groups in the data, which are then used to acquire more truthful prediction results by decision support systems. Knowledge discovery in databases (KDD) process is generally defined using the following stages: selection, preprocessing, transformation, data mining, and interpretation. However, it exists in many variations of this theme such as the Cross Industry Standard Process for Data Mining (CRISP-DM). This theme defines knowledge discovery in six phases; understanding of business, understanding of data, preparation of data, modeling, evaluation, and deployment of the results (Ling Liu & Tamer 2009). Another example of theme follows a simplified process such as pre-processing of data, mining of data and validating the results obtained. Pre processing involves assembling a target data- since data mining only covers the patterns that are essentially present in the data, the dataset targeted should be e big enough to hold these patterns while at the same time remaining brief enough to be extracted within an acceptable timeframe. Common data sources are data warehouses and data marts. Pre-processing of data is indispensable in the analysis multivariate datasets before the mining of data. Therefore, the target set is cleaned. The data cleaning involves the removal of all observations containing the missing data as well as noise. Data mining engage six general groups of tasks. The first task is anomaly detection- this refers to the identification of extraordinary but interesting data records or erroneous data that needs further investigation. Secondly, an association search is the next task. This task involves the relationship existing between variables (Ling Liu & Tamer 2009). The third task is known as classification; it involves all the undertakings employed in generalization of already known structure for application in new sets of data. For instance, an email program may attempt to classify new emails as legitimate or spam. Another task is regression- it involves finding functions which best models the data containing the slightest error. Summarization is also another task in data mining. It involves providing more compact representation of data sets, report generation and visualization. The last step of knowledge discovery from data involves the verification of all the patterns being produced by all the data mining algorithms that occur in the broader data set. It is a fact that some data detected by the data mining algorithms are invalid (Andrew 2011). For instance, some patterns can be found in the training set yet not available in the general data. This is referred to as over-fitting. To beat this; therefore, it is necessary for the evaluation to use sets of data which the data mining algorithm has no prior training. During this, there is the application of the learned patterns to the test set, after which, the results acquired are put in comparison with the preferred output. For instance, data mining algorithms attempting to differentiate genuine emails from spam would be put on training using a training set consisting of sample emails. After the training is done, the newly learned patterns are used to test the previous emails. Then correctness of these patterns can be determined by the number of emails that it classifies correctly. There are several methods that can be used in the assessment of the algorithm. This includes the ROC curves which are used to determine if the learned patterns meet the required standards. If they do not meet, it is advisable that they are evaluated again, and necessary modifications are made in the pre-processing and data mining. Incase they meet the set standards; the next step is sought. This involves the interpretation and conversion of the learned patterns into knowledge. Online analytical processing (OLAP) is considered the fastest and convenient method used in answering questions in multi-dimensional analytical (MDA). The word OLAP was coined from the traditional database word OLTP (Online Transaction Tools). OLAP is considered as a component of the broader group of business intelligence. It consists of relational reporting and data mining. Distinctive applications of OLAP comprise business sales reports, management reports, marketing, budgeting and forecasting, business process management (BPM), and financial reports. OLAP tools help its users in the effective analysis of multidimensional data from various perspectives. It is made up of three major analytical operations. These include; drill-down, consolidation, and slicing and dicing. First, consolidation consists of the process of aggregating data which can be amassed and calculated in one or more ways (Daniel 2007). For instance, all sales offices are put under the sales division or sales department. Secondly, the drill-down is a complete opposite of consolidation. It enables the user to steer through the details. In this way, the users are able to access the sales using individual products forming region’s sales. Thirdly, slicing and dicing involves a situation in which users are able to extract specific sets of data and view the slices from varied points of view. The central point of an OLAP system is an OLAP cube. This cube comprises measures classified by dimensions. Its metadata is naturally developed from a star or snowflake schema of tabulations in relational databases (Daniel 2007). Measures are drawn from the acquired records in the table of facts. On the other hand, dimensions are drawn from the table of dimensions. Every measure is said to be associated with several labels or metadata. However, dimensions describe the above mentioned labels giving more details on the measures. Multidimensional structure refers to the discrepancy of the relational models which use the structures in arranging data and providing the connection between data. The structures are divided into cubes which are able to keep and access data within their boundaries. The cubes consist of combined data, which are related, to the elements of their dimensions. It does not matter whether the data is altered; it still continues to be easily accessible, interrelated and the compact database format remains intact. The multidimensional structure is commonly used in analytical databases using OLAP applications (Daniel 2007). These analytical databases use the databases because of their capability in delivering swift answers to complex business questions. The use of aggregations is the most vital mechanism in OLAP. They are developed from the table of facts by modifying the granularity on exact dimensions and accumulating the data along those specific dimensions. Every possible accumulation of dimension granularities determines the most possible number of aggregations which consists of all answers to all questions in the database. As explicated above, it is obvious that both OLAP and data mining are two of the most universal Business Intelligence technologies. This means that they work on data to obtain intelligence. However, their main discrepancy comes about in the way the function on data. Data mining lays emphasis on patterns, ratios and influences in the set data, whereas, OLAP tools offer multidimensional data analysis since they give summaries of the data. This implies that OLAP deals with accumulation of data; which results, to action of data through ‘addition’. However, data mining matches to ‘division’. Another notable disparity is that OLAP conducts contrast and comparison techniques alongside business dimensions in real time, whereas, data mining applications replicates data and return rules which can be acted upon (Andrew 2011). COMPONENT 2 Compare and contrast the main Business Intelligence technologies, including in your answer a definition of the main divisions or categories in these technologies. e. g. of technologies: Pentaho, WEKA, IBM SPSS MODELER, CRIME STAT [20 marks] Business intelligence (BI) refers to all the computer techniques used in the identification, extraction, and analyses of business data, for example, sales revenue by departments and products or by associated incomes and costs. BI technologies give historical, current as well as predictive opinions regarding business operations. Business intelligence technologies play roles in reporting, analytics, online analytical processing, text mining process mining, business performance management, and complex event processing, benchmarking, and predictive analytics. It aims at supporting better decision making in business. Thus, a BI system is also referred to as decision support system (DSS). In addition to that, the term business intelligence is occasionally used synonymously with competitive intelligence, since both of them are used in supporting decision making. However, there are differences between BI and competitive intelligence. BI involves the use of technologies, processes as well as applications in the analysis of internal, structured data as well as business processes. On the other hand, competitive intelligence collects, analyzes and distributes information with the central focus on the competitors of the company. SPSS refers to a computer program which is used in survey authoring and deployment (IBM SPSS Data Collection), text analytics, data mining (IBM SPSS Modeler), statistical analysis, collaboration and deployment (batch and automated scoring services). It has many features, which can be accessed through pull-down menus or be programmed using a proprietary 4GL command syntax language. This syntax programming command is advantageous in that it promotes simplification of repetitive tasks, reproducibility, handling of complex data as well as manipulations and analyses (O’Brien & Marakas 2009). Furthermore, there are some multifaceted applications which can only be programmed using the syntax command, and are in no way accessible using the menu structure. Also, the pull-down menu interface may generate command syntax shown in the results. This can only be done when the default settings are manipulated so that the syntax is made visible to the individual using it. Another way of doing this is to paste them into a syntax file. One can also use "macro" language in writing command language subroutines. Then, a Python programmability extension can be applied to obtain the information that is in the data dictionary. This Python programmability extension that was launched in SPSS 14 took the place of SAX Basic "scripts" that was considered less functional. However, the Sax Basic is still available up to date. This Python extension gives room for SPSS to run any statistics in the free software package R. Starting from the 14th version, SPSS can be driven outwardly using a VB.NET program or a Python. WEKA refers to data mining packages comprising different methodologies. It has an interface that makes it accessible and easy to use generally. Its extensibility and flexibility make it appropriate for academic use (O’Brien & Marakas 2009). The availability of API allows algorithms to be transferred from other programs. WEKA Software is always written in Java and is offered under the GNU General Public License (GPL). CrimeStat III refers to a spatial statistics program used in analyzing locations of crime incidences. It was developed by Ned Levine & Associates directed by Ned Levine, under the funding of grants from the National Institute of Justice. This program was majorly created to provide supplemental statistical tools that help criminal justice researchers and law enforcement agencies to map out crime. Such agencies include the many police departments in the world. The program enters the crime locations in 'shp', ‘dbf', ODBC or ASCII -compliant formats in two ways, either using spherical coordinates or projected coordinates. It calculates diverse spatial statistics and puts graphical objects to MapInfo, Arcgis, Surfer for Windows, and GIS packages. CrimeStat has five sections. First, it has the Primary file -. This file has crime locations that have X and Y coordinates (O’Brien & Marakas 2009). Each crime is associated with a given time value. This coordinate system appears in two ways; spherical or projected. The second section is the Secondary file. It is an associated file of crime locations. It also has the X and Y coordinates except that it is used to compare with the primary file in cases of duel kernel interpolation and the risk adjusted neighbor. The third section is the Reference file. It refers to a grid file which covers the area under study. In most cases, it is always regular, but sometimes irregular grids may be imported. Another section is the Measurement parameters. These parameters help in the identification and measurement of the type of distance and length that should be used. It also helps in specifying limitations of the study area. CrimeStat III is also able to use a network to link separate points. Each point can be determined by the cost incurred in travel, for instance, speed, time or distance. This enables the estimation of the connection between the incident locations to be realistic. References Andrew, A 2011, “Difference Between Data mining and OLAP. Difference Between.com Retrieved 10 March 2012 from http://www.differencebetween.com/difference-between-data-mining-and-vs-olap/ Daniel, L. 2007, "Data Warehousing and OLAP-A Research-Oriented Bibliography". Erik,T 1997, OLAP Solutions: Building Multidimensional Information Systems, 2nd Edition, John Wiley & Sons Ling Liu & Tamer M. (Eds.) 2009, "Encyclopedia of Database Systems.” O’Brien, J. & Marakas, G. M 2009, “Management information systems (9th ed).” Boston, MA: McGraw-Hill/Irwin. Roberto, B. & Mauro, B 2011, Reactive Business Intelligence: From Data to Models to Insight, Reactive Search Srl, Italy Read More

Divisions and Categories of Business Intelligence Technologies - Assignment Example

Extract of sample "Divisions and Categories of Business Intelligence Technologies"

CHECK THESE SAMPLES OF Divisions and Categories of Business Intelligence Technologies

Logistics and Transportation as a Category within a Category

Business Information Management

Measuring Artificial Intelligence - Symbolic Artificial Intelligence vs Connectionist Artificial Intelligence

Business Intelligence

Role and Value of Data Warehousing

Decision Support System and Business Intelligence

Evaluating the Features of Intelligent Buildings

Key Issues to Be Considered while Dealing with Artificial Intelligence Fault Management