Free

The Life Cycle of a Data Mining Project - Assignment Example

Add to wishlist

Summary

The paper "The Life Cycle of a Data Mining Project" discusses that to accurately provide a framework to sort out the work required by an organization or company and deliver a clear understanding of the bid data, it is essential to think of it as a cycle consisting of different stages…

Download full paper File format: .doc, available for editing

GRAB THE BEST PAPER96.1% of users find it useful

Read Text

Subject: Information Technology
Type: Assignment
Level: Undergraduate
Pages: 5 (1250 words)
Downloads: 0

Extract of sample "The Life Cycle of a Data Mining Project"

The life cycle of data mining projects is a complex process and can have a high failure rate. A life cycle is essential to the overall improvement of the project management and the positive results rate of such projects. Such projects' success rate lies more on team ability to follow each step, as stated in the cycle. The project lifecycles outline a structured viewpoint for the project (Ristoski & Paulheim, 2016). It allows all individuals working in the project to pinpoint how the project is progressing. The cycle has a clearly defined task and output for each outlined phase. It offers a common strategy for the team to follow and in working towards the set goals. The life cycle of the data mining project aims at supporting the entire technical team, academic researchers, and It managers. It also helps in improving the success rate of the process and supports strategic decisions (Cashman et al. 2016). In this paper, I will examine the six phases of the project life cycle of data mining. Six data mining lifecycle phases include:

Data creation

It is the first phase of the cycle. At this stage, the technical team and managers seek to determine how data enter the set enterprise. When organization or company employees create a file, come up with design research complies result in a spreadsheet, data is received through forms captured in the company website, or any other description of data creation that information automatically becomes a segment of company data (Lachmayer & Gottwald, 2015). The current information remained in th the company servers, cloud, or host data center.

In this stage, the experts need to query an existing database, using technical skills such as MYSQL. The personnel may also receive any necessary data in file formats like Microsoft excel. If the company is using R or python, the team has a specific package used to read data from different data sources directly into the set data science programs (Talburt & Zhou, 2015). Different types of databases, such as PostgreSQL, non-rational database(NoSQL), or even oracle, may appear. Another name to obtain the required data is by using scrape from the organization website through the application of scraping tools like a beautiful shop. Another commonly used option of gathering information is by connecting to the web APIs. Web-based social media platforms such as Twitter and Facebook allow users to connect directly to their web servers and retrieve their data (Ristoski & Paulheim, 2016). All that the experts need to do is to apply company Web API to craw their data. Although the phrase is not common to all processed information, it is vital in cases in which it is mandatory to generate valuable data through collective reasoning. This type of analysis also applies to account, risk modeling, and investment decisions.

Data maintenance

After obtaining data, the next duty is the scrubbing of data. In this stage, there exists a broad range of management actions. These include a way of supplying to the end-users and way in which analytics like modeling takes place. The purpose of this stage is to clean and filter data. To develop sufficient data, it is vital to filter and eliminate unnecessary data. In this stage, data need to be converted from one format to another and combine everything into unit standardized data format across all data. In cases where data storage happens in multiple CSV files, experts need to unite these CSV data into one repository for the processing and to analyze purposes (Cashman et al. 2016).

Maintenance of data involves the duty of drawing out and replacing values. If experts realize that there are misplaced data sets or non-values, it is time for the responsible individuals to replace them accordingly. Lastly, the team needs to split, merge, and extract columns. It is taking an example of the place of origin where there is both city and state. Based on the requirement, the team needs to either split or merge these data. Maintaining data is essential to keep the data in good health; it ensures that data rot cannot progress to a catastrophic stage (Talburt & Zhou, 2015). That gives one good reason why data maintenance is essential and proves why it is a vital stage in the data mining lifecycle.

Data usage

At the third stage of the cycle, data is used and moved around the enterprise. It is a service or product that a company offers. The biggest challenge at this stage is compliance and governance. At this stage, data from the maintenance phase support organization activities. Data can be processed, viewed, modified, and stored n the organization files (Cashman et al. 2016). An audit trail should frequently take place to ensure that the modification of data is entirely traceable. During data usage, readily available data can also be shared with other necessary outside organizations. Alteration of data occurs when they are a change in sored value in a computer to a completely different amount. If the data is changed and stored in the same device, it is thus modified.

In the current business environment, a boss throws employees a set of data, making sense of it. It will be up to the employees to figure out different business questions and transform them into scientific issues. To properly undertake this role, employees need to inspect the given data and its features. Different data types, such as categorical data, numerical data, standard data, and ordinal data, require different and unique treatments. Next, employees or staff members need to compute descriptive statistics to develop features and test significant variables. Correlation often applies to test significant variables (Ristoski & Paulheim, 2016). lastly, experts utilize data visualization in identifying significant trades and patterns in the data. Experts can gain a better picture of the data by using bar charts or line charts to help them understand the benefits of the data.

Data publication

It is the stage where data can leave an enterprise. At this stage, an organization can use the data collected to send out investment statements or invoices to the customers. It is a practice which involves preparing a particular data and release it for the public use. The data is made available for anyone interested to use as they wish. There is a wide range of multidisciplinary consensus on the advantages resulting from this practice (Lachmayer & Gottwald, 2015). The main objective is to upgrade data to the first-class research findings. Several ways which used to make the data available include

posting data on a publicly accessible website,

publishing it as a supplemental materia associated with a research article, and

editing a data paper on the dataset may take place in the form of preprints, in a journal, or even in a data journal that is dedicated to supporting data paper (Cashman et al. 2016).

Publication of the data allows researchers to both enable datasets to be cited similarly to other research publications of the same kind and make the data available to others.

Data archiving

At this stage, the data in the system is not immediately used but preserved for future purposes. The data is removed from the active environment and moved to storage. Data archival involves copying data to a situation where storage occurs for possible future needs (Cashman et al. 2016). Storage of data in this stage takes explicitly place with no maintenance or general use.

Data destruction

The volume of data achieved gradually grows; even when the company or organization wants to save these data forever, the idea might not be feasible. Compliance issue and storage cost exerts pressure to the enterprise to destroy any unnecessary data. The process involves removing every copy of the data element from the organization (Lachmayer & Gottwald, 2015). It typically takes place from an archive storage location. The biggest challenge in this stage is to make sure that there is proper destruction of data. Many businesses today entirely depend on data, in cases where data storage takes place across a network, or electronic device disposal becomes more complicated. If shredding or wiping of data doesn't occur correctly, some data could lick and result in the data breach.

In conclusion, to accurately provide a framework to sort out the work required by an organization or company and deliver a clear understanding of the bid data, it is essential to think of it as a cycle consisting of different stages. These stages directly relate to each other and consist of specific tasks and outputs.

Data Warehousing and Data Mining

Data Warehouse There is no consensus on the definition of a data warehouse.... Characteristics of a data Warehouse There are few important characteristics of a data warehouse.... Most important characteristic of a data warehouse is the strength of its repository, which depends on the availability of sufficient historical and current data.... This research ''Data Warehousing and data mining'' tells that data warehouses are primarily decision support systems and this functionality is achieved through data mining....

8 Pages (2000 words) Research Paper

The Jelimar Development Project

This essay "The Jelimar Development project" focuses on one of the natural gas mining activities of Apache Corporation in Carnarvon Basin, offshore Western Australia.... The aim of the project is to generate approximately 140 million cubic feet per day of LNG.... The Jelimar Development project is an offshore natural gas mining in Carnarvon Basin in Australia.... This project dictates that JDP has to conduct natural gas drilling, transportation, as well as the refinery of this natural gas for domestic consumption....

12 Pages (3000 words) Essay

Mining Investement Analysis

According to ICSG Copper Market Forecast 2012-2013, the amount of refined copper production in the world is projected to increase by 2.... % reaching 20.... 5 mmt in 2012.... There are two types of copper mine production and they include surface mines and underground mines.... Over the.... ... ... Copper mine production all over the world has increased significantly over the past 40 yeas and more so over the last decade....

15 Pages (3750 words) Research Paper

Differences between Data Mining and Reporting Systems

This essay "Differences between data mining and Reporting Systems" presents data mining systems that are used in finding hidden patterns in a collection of data that can be applied in the prediction of future behavior while data reporting systems are used to produce human-readable reports.... data mining systems are used to search for patterns and relationships among data and use the results to make predictions (Wiley 245).... data mining systems are costly compared to data reporting systems....

2 Pages (500 words) Essay

Mine Design and Feasibility, Golpu Project

This research "Mine Design and Feasibility, Golpu project" uses public information to provide a valuation paper advising prospective investors on whether to invest in the Golpu project or not.... This report establishes that the Golpu project is a viable project and investors need to invest in it.... Wafi-Golpu project for mineral resources is presented at a threshold grade of 5.... The commercial and technical analysis done by the team in the project was adequate to present a sizeable increase in Ore reserve approximation for Golpe....

25 Pages (6250 words) Research Paper

Eight Phases of Planning My Project

etails of building a data mining system and running classification algorithms against test data ... etails of building a data mining system and testing decision tree algorithmsI found out more about decision trees, decision tree algorithms, and their application in real life.... In the project proposal, the author of the paper "data mining System" planned to do the project in eight phases and some of them are gathering Information for the paper, system design, and analysis, and detailed analysis of factors involved in reading databases....

9 Pages (2250 words) Assignment

Decision Support Methods of Environmental Impact Assessment of a Mining Site Contamination

The objective of this paper "Decision Support Methods of Environmental Impact Assessment of a mining Site Contamination" is to outline the different decision support methods developed in the literature, which can be used to mitigate mining contamination.... Environmental contamination and pollution that arise from mining in Europe triggered the development of environmental legislation to protect the environment.... Most pertinently, it is vital that the environmental legislation include an environmental assessment that should be incorporated into the mining industry to act as a management method to prevent earth degradation....

6 Pages (1500 words) Research Proposal

Data Mining Demographic Information and Transaction Data of a Large Retail Company

This research proposal "data mining Demographic Information and Transaction Data of a Large Retail Company" demonstrates the presence of the electronic business cases using data mining techniques to obtain insights on how a company can identify and support loyal customers.... The outcome of this data mining activity will enable the business to use the data to run loyalty programs that involve promotions.... Business Intelligence (BI) gives retailers the chance to meet the ever-changing desires and needs of customers through the use of tools like data mining and data warehousing....

8 Pages (2000 words) Research Proposal

The Life Cycle of a Data Mining Project - Assignment Example

Extract of sample "The Life Cycle of a Data Mining Project"

CHECK THESE SAMPLES OF The Life Cycle of a Data Mining Project