StudentShare
Contact Us
Sign In / Sign Up for FREE
Search
Go to advanced search...
Free

Data Mining and Prediction Modeling in Health Care - Term Paper Example

Cite this document
Summary
This research 'Data Mining and Prediction Modeling in Health Care' tells that through data mining and CART systems, well-structured, adequately defined, and reliable clinical decision rules can be developed. These reliable rules will play a major role in ensuring that new patients are appropriately classified into clinically important categories…
Download full paper File format: .doc, available for editing
GRAB THE BEST PAPER92.7% of users find it useful
Data Mining and Prediction Modeling in Health Care
Read Text Preview

Extract of sample "Data Mining and Prediction Modeling in Health Care"

Data Mining and Prediction Modeling in Health care Data Mining and Prediction Modeling in Health care Background Information SickleCell Anemia (SCA) is a lifelong, hereditary, and hematological disorder that results from an abnormality in an individual’s oxygen-carrying hemoglobin molecules located in the red blood cells (American Accreditation Health care Commission, 2013). The abnormality in the hemoglobin molecules results from a mutation that occurs due to the inheritance of the abnormal hemoglobin gene (HbF). This gene facilitates the production of the HbS, the hemoglobin-sickle that converts the biconcave, disk-shaped, and soft, rounded red blood cells into rigid, brittle, and half moon-shaped red blood cells (Solanki, 2014). Even though this disorder is highly genetic, some individuals, born of only one parent with SCA, carry the Hb-SA gene that makes them asymptomatic. They thus carry the sickle cell gene in their blood, but the disorder does not manifest phenotypically. Such people are termed as carriers (Solanki, 2014). Statement of Problem SCA increases a person’s susceptibility to infections and disease-related complications. Patients also experience episodes of intense pain (AAHCC, 2013). In some cases, SCA could be fatal due to the acute oxygen depletion that leads to organ failure (AAHCC, 2013). Unlike other hematological disorders like anemia that can be cured or alleviated with diets rich in iron, Vitamin B12, and C, SCA can neither be cured nor alleviated with food (Dampier et al., 2011). In fact, close to 300,000 children are born with a subtype of SCA annually. Such children do not live beyond the age of five due to complications resulting from the increased vulnerability to related diseases (Dampier et al., 2011). Fortunately, there has been a recent focus on research in disease-modifying drugs and proposed curative strategies and therapies which can minimize morbidity and boost prognosis (Maakaron, 2014). Nonetheless, there is an acute dearth of efficient systems of data collection and data mining in this field. Such practices are highly potent in ensuring the collection of massive sets of data which can be effectively converted to useful knowledge that in turn boosts the development of the disease-modifying drugs and curative strategies and therapies. In doing so, the aforementioned practices could also save on medical expenditure, reduce morbidity among SCA patients, and improve the quality of patient care. Significance of Study As mentioned earlier in this document, SCA is a lifelong disorder. Therefore, a prognosis is very important if SCA patients are to live normal lives with minimal morbidity. This means that a boost of SCA prognosis through data mining and predictive modeling using strategies such as the Classification and Regression Tree (CART), has the potential of improving the lives of SCA patients (Berk, 2008). Once resources are channeled towards prognostic strategies, there is a high likelihood that the average lifespan of SCA patient will increase. Statistics indicates that in 1973, the average global lifespan for patients with SCA was only 14 years (Lewis, 2000). However, due to technology advancement and the ever-growing investment in medical research and development, the average lifespan of SCA patients is now at 48 years for women and 42 years for men (Lewis, 2000). Few studies focus on data mining; its relationship with SCA, its related complications, and SCA patients’ health-related quality of life. This study will boost disease management strategies and allow for early detection of SCA-related complications. Data mining on information relating to SCA will benefit not only the patients but also the medical practitioners. Through data mining and CART systems, well-structured, adequately defined, and reliable clinical decision rules can be developed (Loh, 2011). These reliable rules will play a major role in ensuring that new patients are appropriately classified into clinically important categories. The ease of classifying patients as a result of data mining will facilitate proper decision-making practices regarding treatment methods or hospitalization even in emergency scenarios. This will in turn reduce the instances of ethical dilemmas which have been proven to cause moral distress among medical practitioners. Moreover, by using computer-assisted analysis, through CART systems and data mining programs, all the data collected by health care institutions can be converted into useful pointers (Loh, 2011). Data mining, aided by computer programs, allows for the integration, synthesis and synchronization of the highly uncertain, vastly dimensional, and greatly distributed raw health data. This will go a long way in unraveling the undiscovered and unexpected prognostic health care dynamics hence proper patient care systems can be established to ensure SCA patients and their health care givers are under minimal stress. Elements of the Health care System In order to make a proper prognostic system, it is paramount to identify the main predictor factors. Since the prognostic system at hand mainly focuses on SCA, the main predicting factors include pregnancy, dactylitis, hemoglobin levels, and White Blood Cell (WBC) count (Pekelis, 2013). In pregnancy, the rates of fetal loss, premature births, and underweight children are critical factors to look out for in SCA prognosis. For the head-foot syndrome (dactylitis), the system will mainly focus on infants below the age of one year. For hemoglobin levels, the system will focus on patients with Hb levels below 7g/dL. Finally, for WBC count, the system will focus on patients showing signs of leukocytosis even in the absence of an infection. The main reason for selecting these parameters as predictor factors are because the hand-foot syndrome (dactylitis) mainly affects children under five years of age. Therefore, identifying children with the syndrome at ages below one year will ensure such children are placed under appropriate care and medication thus reducing their vulnerability to infections and SCA-related diseases. In a nutshell, the hand-foot syndrome is a significant indicator of the level of SCA severity since children who have it before the age of one year are most likely to have a severe clinical course. Second, the hemoglobin levels also play a significant predictor role when it comes to SCA because the disorder detrimentally affects hemoglobin (Hb) levels. If a child records a baseline Hb level that is below 7g/dL, then there is a high probability that the individual will suffer from severe SCA in future (Pekelis, 2013). Once such low Hb levels are recorded, the child will be placed under medical scrutiny to ensure minimal morbidity even in future. Finally, the WBC count also indicates the level of immune defense activity in a patient’s body (Solanki, 2014). Therefore, if a child’s WBC count is higher than normal in the absence of an infection, it is likely that their immune system is trying to combat the changes brought about by the Hb-SA (Hemoglobin-Sickle) gene. Such a child should be placed under medical scrutiny if a severe or fatal case of SCA is to be avoided. The selection of the aforementioned parameters was not only based on health facts but also on statistics from research studies conducted in different parts of the world. According to Maakawn (2014) of MedScape, SCA mortality cases are very high during childhood years. Reasons for this, particularly in Sub-Saharan Africa include the lack of diagnosis, misdiagnoses, and the sporadic and insufficient nature of data on child mortality. In other cases, mortality from SCA, a disorder dubbed “the suffering,” is considered a taboo hence death from it is attributed to other diseases such as malaria. This corrupts the infant mortality data hence hindering prognostic studies resulting in high child mortalities due to SCA. Data from the Brazilian National Newborn Screening Program indicates that out of the 3,500 children are born with SCA annually, 20 percent die of it before the age of five years (Maakaron, 2014). The infant mortality rate due to SCA was 25 percent until the Rio de Janeiro Blood Center shed light on the importance of SCA prognosis. When the organization initiated a program which availed proper treatment, adequate attention, and care to children suffering from SCA, the infant mortality rate due to SCA diminished to just 2.5 percent. Additionally, a study by the Corporate Study of Sickle Cell Disease (CSSCD) in 1995 indicates that the introduction of penicillin prophylaxis and pneumococcal vaccinations to children with SCA or SCA traits reduced the instances of acute chest syndrome which was the main fatal disease associated with SCA. This boosted survival rates due to early diagnoses and treatment hence justifying the prognostic efforts focused on children. In order to cope with the numerous features interacting in complicated and non-linear ways, the suggested system will use a partitioned system design. Prediction trees, as the name suggests, uses tree-like algorithms to represent the recursive partitioning of interactions into smaller hierarchical clusters and regions (Maakaron, 2014). From the roots of the tree model is a terminal node or leaf which is the equivalent of a particular cell. This means that point x is related to a leaf if x falls in the corresponding cell. In order to find a particular cell in such a system, one traces it from the root nodes by asking several questions about the characteristics of the cell. Using the CART system, all the interior nodes are labeled with questions whose answers are labeled on the edges and branches between them. This means that the answers provided in the previous section dictates questions being asked in the subsequent section. The health care sector has many “predictor” variables hence the suggested system ought to have the capability of making multiple comparisons from different data sets. Since different groups of patients have different extents of both variance and variation, the system has to be able to accommodate randomly distributed predictor variables (Lewis, 2000). For instance, the value of variable A such as a patient’s age may greatly affect the importance of variable B such as the same patient’s weight. As the number of interactions between the variables increases, it becomes more challenging to model them (Lewis, 2000). However, when using CART analysis, multivariate logistic regression models can be used to project a patient’s probability of disease. This probability is calculated by using the pre-recorded patient characteristics alongside regression coefficients to introduce the dynamic of probability which replaces the usual “high risk” versus “low risk” perception in the current clinical practice. Using CART analysis in combination with existing health data, clinical decision rule frameworks can be fabricated by using large data sets. The dependent variable for every patient a particular dataset could be the patient’s medical history. The dependent variable can, therefore, be whether or not the patient at hand has a history of the condition in which the medical practitioners hope to accurately predict in other patients; in this case SCA (Loh, 2011). Examples of such a variable include elevated WBC counts and hemoglobin levels lower than 7g/dL. Other variables could include the patient’s characteristics which can play a role in predicting the value of the dependent variable. For instance, if a medical practitioner wised to predict the likeability of a patient to having SCA, a possible predictor variable could be a patient recording a sudden elevation in WBC count or a sudden weight loss even in the event that the patient has no record of infections in the recent past. Advantages of Using CART Analysis in SCA Prognostic Systems There is a wide array of methods medical analysts can deploy in creating prognostic programmes for early detection and prediction of SCA (Loh, 2011). However, the CART system of analysis has the potential of making accurate predictions from a massive dataset based on a couple of simple if-then conditions. This system has a number of advantages discussed hereinafter. First, the results of the CART system are very simple (Loh, 2011). The output from this system, in this case survival rates, surgical urgency, and myocardial infarction, is very simple. The simplicity of the results makes it very useful in the field of health care were rapid patient classifications are required especially in emergency situations. Using this system if analysis, the practitioner only analyses and evaluates just one or two conditions at a time hence using this system is much easier than computing classification scores for all datasets (Loh, 2011). The output from this system is also dominated by simple if-then statements as opposed to complex non-linear model equations which are the output of other analysis methods. As mentioned earlier, the output of the CART system of analysis is a series of simple if-then conditions called tree nodes. Therefore, there is no assumption that the relationship between the predictor variable and the dependent variable is linear, or if they follow specific non-linear link functions, or if the two are monotonic in nature (Pekelis, 2013). Therefore, the CART system is non-linear and non-parametric. This makes it suitable for data mining tasks since there is little prior knowledge on the subject matter and no coherent sets if theories related to SCA. Moreover, there is a growing interest in the use of the CART system of analysis over the last decade since it uncovers some of the interactions between predictor variables. This makes its popularity surpass that of other traditional techniques. Disadvantages of Using CART Analysis in SCA Prognostic Systems The CART system also has its shortcomings. One of the major issues arising from the use of CART systems of analysis arises when applying it to actual data which is much more random than anticipated. In such an instance, it becomes very difficult to draw the line on when to stop splitting datasets (Solanki, 2014). For example, in an instance involving 10 medical SCA cases, up to 9 if-then conditions can be developed so that every single case can be adequately predicted. The theory behind this is that a continuous split of the cases allows analysts to reproduce the data hence predict the most probable outcomes (Solanki, 2014). However, it is not always certain that the continuous splitting of the cases will culminate in a replication of the data hence escalating the risks involved in the decision cost matrix. The decision cost matrix outlines the costs associated with a misclassification on a new patient. Errors of bigger magnitudes result from classifying patients with emergent health conditions as non-urgent as compared to the misclassification of patients will non-urgent health conditions as urgent. Additionally, most statisticians lack adequate knowledge and information on how CART analysis systems works. This has hindered its acceptability and the credibility of the output from the system with the public (Solanki, 2014). Until recently, using CART analysis systems has been very difficult hence most practitioners prefer other traditional techniques. The relative novelty of CART analytical systems has made it difficult to find statisticians with proficient expertise in the system. This has created a challenge in locating advisors and assistance for people willing to use CART analytical systems. Since it is not considered as a standard analysis technique, CART is normally excludes in most statistical software packages like SAS. The Classification and Regression Tree (CART) analysis is a highly potent system especially in the clinical research arena. The CART system can be easily integrated into the operations and databases of health care organizations since its use is highly diverse and extremely beneficial especially in prognostic studies of Sickle Cell Anemia. Using classification algorithms, medial analysts and practitioners can continuously analyse blood samples with respect to age and create prediction models that can be used to make early diagnosis thus reducing morbidity in SCA patients. The application of CART will also play a significant role in patient classification which in turn streamlines health institutions’ operations. References American Accreditation Health care Commission. (2013). Sickle Cell Anemia. Health Guide, New York Times. Retrieved on 12th June 2015 from: http://www.nytimes.com/health/guides/disease/sickle-cell-anemia/prognosis.html Berk, R. D. (2008). Statistical Learning from a Regression Perspective. Springer Series in Statistics. New York: Springer-Verlag. Dampier, C. K., LeBeau, P. H., Rhee, S. T., Lieff, S. B., Kesler, K. T., Ballas, S. J., Comprehensive Sickle Cell Centers (CSCC) Clinical Trial Consortium (CTC) Site Investigators. (2011). Health-Related Quality of Life in Adults with Sickle Cell Disease (SCD): A Report from the Comprehensive Sickle Cell Centers Clinical Trial Consortium. American Journal of Hematology, 86(2), 203–205. doi:10.1002/ajh.21905 Lewis, R. G. (2000). An Introduction to Classification and Regression Trees (CART) Analysis. Harbor-UCLA Medical Center, Department of Emergency Medicine. 1 (1), 1-13. Retrieved on12th June 2015 from: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.95.4103&rep=rep1&type=pdf Loh, W. K. (2011). Classification and Regression Trees. WIRES Data Mining Knowledge Discovery. John Wiley and Sons, Inc, 1 (1), 14-23. Retrieved on 12th June 2015 from: http://www.stat.wisc.edu/~loh/treeprogs/guide/wires11.pdf Maakaron, J. K. (2014). Sickle Cell Anemia. MedScape. Drugs and Diseases. Retrieved on 12th June 2015 from: http://emedicine.medscape.com/article/205926-overview#aw2aab6b2b7aa Pekelis, L. J. (2013). Classification and Regression Trees: A Practical Guide for Describing a Dataset. Classification and regression Trees, Biocoastal Datafest, Stanford University. Retrieved on 12th June 2015 from: http://statweb.stanford.edu/~lpekelis/talks/13_datafest_cart_talk.pdf Solanki, A. D. (2014). Data Mining Techniques Using WEKA Classification for Sickle Cell Disease. Research Scholar, JJT University. International Journal of Computer Science and Information Technologies, 5 (4), 5857-5860. Retrieved on 12th June 2015 from: http://www.ijcsit.com/docs/Volume%205/vol5issue04/ijcsit20140504222.pdf Read More
Cite this document
  • APA
  • MLA
  • CHICAGO
(Data Mining and Prediction Modeling in Health Care Research Paper Example | Topics and Well Written Essays - 2500 words, n.d.)
Data Mining and Prediction Modeling in Health Care Research Paper Example | Topics and Well Written Essays - 2500 words. https://studentshare.org/information-technology/1880056-data-mining-in-healthcare
(Data Mining and Prediction Modeling in Health Care Research Paper Example | Topics and Well Written Essays - 2500 Words)
Data Mining and Prediction Modeling in Health Care Research Paper Example | Topics and Well Written Essays - 2500 Words. https://studentshare.org/information-technology/1880056-data-mining-in-healthcare.
“Data Mining and Prediction Modeling in Health Care Research Paper Example | Topics and Well Written Essays - 2500 Words”. https://studentshare.org/information-technology/1880056-data-mining-in-healthcare.
  • Cited: 0 times

CHECK THESE SAMPLES OF Data Mining and Prediction Modeling in Health Care

Management Science /Operational Research literature for the year 2009

That is why providing intervention for diseases as soon as they have been diagnosed is an important determinant of a health care center's efficiency.... The system is then validated by inputting real data of independent variables through the model and the model's ultimate outcome is then confirmed by checking if the outputs (in RT planning's case, the waiting time) are similar to the actual system outputs.... Because most data regarding the steps are not necessarily empirical, Werker et al....
4 Pages (1000 words) Essay

Climatology and Pollution of Watersheds

Most individuals are not aware about the source of their drinking water an aspect that means that they care less whether the water they drink have been treated or not before they reach their homes.... Studies concerning watersheds have been conducted in places such as Coweeta in which the data collected have been used by researches to evaluate the effectiveness of various ecological regions....
7 Pages (1750 words) Essay

Analysis of Two Studies about Diabet

The articles also reveal the utter severity of diabetes, which prompts both the policy makers and health care facilities to adopt timely strategies meant to reduce its prevalence.... Additionally, numerous nutritionists and health practitioners have also cited Americans' sedentary lifestyle as another major contributing factor.... Diabetes Projection of the year 2050 burden of diabetes in the US adult population: dynamic modeling of incidence, mortality, and prediabetes prevalence....
4 Pages (1000 words) Essay

Foundation of Data Mining

data mining Name: Institution: Introduction data mining, also known as knowledge discovery, is the process of extracting and analyzing data from different sources and summarizing it into helpful information.... hellip; data mining software is a computer aided process of extracting and analyzing hidden predictive information from a large set of data (Hoptroff & Hoptroff, 2001).... data mining tools helps in predicting the behaviors and future trends of a business' operations, thus allowing it make proactive and knowledge-based strategies....
5 Pages (1250 words) Research Paper

The Infrastructure of Data Management and Data Mining Capabilities

As this is an extremely broad definition it generally focuses on a server-side data management and data mining, but within this paper, there is a need to have a broader focus of the end-user data management which will encourage employees to have a central repository for their files.... Data analysis is a common term for data modeling and this activity is actually more in common with the ideas and methods found in synthesis than it does with analysis.... The consultancy report was also designed to look at what current data management processes are in place and how to encourage a move to a more viable infrastructure....
37 Pages (9250 words) Research Paper

System Health Prognostics

System health prognostics are a set of actions performed on a system to preserve it in operable condition.... The paper "System health Prognostics" discusses how to improve a current method or develop a new one.... This kind of analytical finding or system health prognostics are motivated by the requirement for manufacturers and other operators of complex systems for improving performance of equipment and reducing costs of maintenance and surprise failure of equipment....
19 Pages (4750 words) Research Paper

Time Series Data Mining and Forecasting Using SQL Server 2008

This thesis "Time Series data mining and Forecasting Using SQL Server 2008" carries out data mining using the records on the production of major crops in Ghana for the past forty years as the data source.... It overviews time data mining, trends in data mining, review literature, etc.... hellip; In view of the increasing utilization of modern information technology, we use data on the production of some major crops in Ghana over the past forty years as a case to help in illustrating the manner in which data mining is applicable in such a time series helping the state to witness the benefits of such efforts....
64 Pages (16000 words) Thesis

Active Shape Modelling in the Prediction of Hip Fracture

t would be noticed that prior to the actual implementation of the use of the active shape modeling, there were a series of runs that were conducted.... This work called "Active Shape Modelling in the prediction of Hip Fracture" describes the most effective way of increasing the chances and accuracy with the prediction of the existence of hip fractures in humans.... hellip; Qualitative data were collected in the form of a review of secondary data while quantitative data were collected in the form of primary data, which were collected by setting two major respondents thus control and cases....
11 Pages (2750 words) Research Proposal
sponsored ads
We use cookies to create the best experience for you. Keep on browsing if you are OK with that, or find out how to manage cookies.
Contact Us