Fuzzy Logic with Data Mining with respect to Prediction and Clustering Research Proposal Example | Topics and Well Written Essays

? Fuzzy Logic with Data Mining with respect to Prediction and Clustering Insert (s) Fuzzy Logic with Data Mining with respect to Prediction and Clustering Introduction Algorithms that use fuzzy logic are increasingly being applied in several disciplines to help in data mining of databases. Fuzzy logic is an approach of data mining which involves computing the data based on the probable predictions and clustering as opposed to “true or false”. One of the potential applications of Fuzzy logic algorithms is the clustering of breast cancer data to enable oncologists detect and evaluate breast cancer risks such as malignant tumors. According to Jemal and Ferlay (2004, p.69), breast cancer is currently one of the major health problems as well as the leading cause of death amongst women worldwide. Consequently early detection of cancer risks is one of the key ways of improving the prognosis of the disease. Although there are a number radiological techniques such as mammography that can be used in the early detection of breast cancer risks, the enormous data generated by these techniques often make it difficult for radiologists to accurately evaluate breast cancer data (Dorf and Robert, 2001, p.234). Artificial intelligence techniques such as fuzzy clustering algorithms can therefore significantly improve the diagnosis and evaluation of breast cancer risks through clustering of the particular data elements. Consequently the incorporation of fuzzy logic algorithms in data mining is a powerful tool that can be employed in the extraction, clustering, quantification and analysis of the data base information regarding the assessment and diagnosis of cancer risks. When dealing with uncertainties in databases, fuzzy logic clustering algorithms can be used to cluster different elements of data into various membership levels depending on their closeness (Castillo and Melin, 2008, p.94). For example, during the evaluation of breast cancer risks, mammogram data may possess some degree of fuzziness such as ill defined shapes, indistinct borders and different densities. In this regard, a fuzzy clustering algorithm can be one of the most effective ways of handling the fuzziness of data related to breast cancer. As an intelligent technique, Fuzzy logic data mining algorithms not only provide excellent analysis of the data but can also be used to develop accurate results that are easy to implement. One of the greatest potential advantages of incorporating fuzzy logic in data mining is the fact that such algorithms can significantly be used in the modeling of inaccurate, non linear and complex data systems by implementing human knowledge and experience as a set of fuzzy rules that uses fuzzy variables for inference purposes (Nguyen and Walker, 2003, p. 96). For example when using fuzzy algorithm for the prediction and clustering of breast cancer data, the human experience and knowledge related to breast cancer risks can be expressed as a set of inference rules of deduction that are then attached to the fuzzy logic system. Another important advantage of fuzzy algorithms systems for prediction and clustering of breast cancer data is that they usually have a significantly high inference speed. This paper proposes a fuzzy clustering algorithm that can be used in the data mining of breast cancer data and consequently in the evaluation and prediction of cancer risks in patients with suspected cancer cases. Proposed single If-then fuzzy rule Assuming that we have a classification problem with an n-dimensional c-class pattern whose space is given by n-dimensional cube (0, 1), n as well as that the m patterns Xp=Xp1,…Xpn, where p=1,2,…..m, we will need to generate the fuzzy if then rule in which Xpi [0,1] for p=1,2,…., m, i =1,2,…..,n. Based on the proposed single fuzzy If-then rule that is based on the mean and standard deviation of the attribute values, the fuzzy rule will be generated for each of the classes. Consequently the fuzzy If then rule for the kth class is written as X1=A and X n =A n and Ai = the antecedent of the fuzzy for the ith Attribute value. In this case, the membership function of Ai is specified as Ai (xi)=exp - (xi- mi)2 2(si)2 ………..(2) Where m=mean and s=standard deviation. For a two dimensional class classification pattern, the membership function of each of the antecedent fuzzy set will be specified by both the mean and the standard deviation of the attribute values. In case of a new pattern, x p = (xp3, xp4), the rule is given as A3*(xp3)A2*(xp4)=max {A1k(xp3).A2k(x p4) k=1,2}…………………………………….(3) Steps of the proposed fuzzy clustering algorithm All the data indicating malignant breast cancers are fixed with the dangerous class of cancer. Using “If–then rule”, if variable 1 is considered low in cancer risk while variable 2 is high, then the output is benign. Otherwise the output will be malignant. This means that if the antecedent is true, the consequent will also be considered to be true. The second part of the fuzzy classification system requires the application of the results to the inference (consequent of the variables). The fuzzy clustering algorithm based on the mean and standard deviation of the breast cancer attribute values uses an approach in which a single “If- then” rule is generated for each of the fuzzy classes (Mendel, J.1995, p.371). Consequently the membership function of each of the antecedent fuzzy sets is specified by both the mean and the standard deviation of the given attribute values. The proposed fuzzy data clustering algorithm for prediction and clustering of data related to breast cancer risks works by dividing cancer risk data elements into clusters which share many similarities. Each of the data clusters is then associated with a specific set of fuzzy member ship functions which will indicate the extent of closeness between the data element and the cluster (Bezdek, 1981, p.78). The first procedure involved determining the input and output variables that describe cancer risks such as the presence of malignant and benign tumors in the patients as well as their variation intervals. Next, a set of linguistic values together with their membership functions that map the numerical range are defined for each of the fuzzy variables. After the definition of the fuzzy inference rules between the input and output, fuzzification and defuzzification of the cancer data is then completed. Generally this kind of fuzzy algorithm uses standard deviation and mean of the attribute values to generate if-then rules. The central notion of this fuzzy clustering algorithm is based on the fact that the membership values of the fuzzy sets or the truth values of fuzzy logic are indicated by values ranging from 0.0 to 1.0. In the case of clustering breast cancer data, the value 0.0 is used to represent absolute falseness in the probability of breast cancer for example when no malignant tumors are detected in the patient (Watanabe, H. 1994, p.91). On the other hand, the value 1.0 is used to represent absolute risk of breast cancer development using information like the presence of malignant tumors in the patients. As earlier been noted, the fuzzy system will be characterized by a given set of linguistic variables that are largely based on expert knowledge. The major variables in breast cancer databases include the information regarding the risk factors and as well as mammographic findings. Although the primary causes of cancer are not yet known, there are a number of risk factors that have been identified and can therefore be fixed to particular classes. Generally tumors can be malignant (cancerous) or benign (non-cancerous). In most cases, malignant tumors have rapid growth that often results in the destruction of normal tissues and their eventual spread to all parts of the body. On the other hand, benign tumors tend to be localized and grow slowly without any significant spread to the other parts of the body (Mukherjee, S. 2010, p.45). Consequently the risk of breast cancer development is generally higher when malignant tumors are detected in an individual. Classification of breast cancer data using the proposed fuzzy clustering algorithm Generally the diagnostic procedure that is often used in the identification of breast cancer consists of observation of the breast imagery results to help identify the characteristics that define the cancerous development stage of the lesions detected. Based on previous studies and analysis of breast cancer screening cases, a set of characteristics that best define the presence of malignant breast tumor have been identified. Some of these characteristics include the presence of spiculations, calcifications and irregular margins of the breast lesions. Based on these characteristics, the proposed fuzzy clustering algorithm can be used to classify breast cancer data bases and in a concise and standardized way (DeSilva, 1994, p.80). This can be achieved by assigning numerical codes to the different categories of breast cancer risks. For example the assessment categories can be assigned numerical values such as 1=Negative, 2= Benign findings and 3=Malignant findings. Most of the data related to the breast cancer risk factors are obtained from mammography, self examination, ultrasound testing and biopsy among other diagnostic procedures. When using the “if –then” rule in the designing of the fuzzy clustering algorithm for the prediction of breast cancer risks, a number of definitions will be used to connect the fuzzy sets with their corresponding membership functions which can effectively be implemented by the fuzzy conditional statements (Prieto and Ortega, J. 2002, p.132). In this regard, all the cases of malignant tumors in the patients will be classified under high risk factors. On the other hand, age is another important variable that significantly affect the risk of breast cancer in the affected patients. According to Kleinsmith and Lewis (2006, p.57), this is because the risk of developing breast cancer in an individual often increases with her age. Consequently women in their late 60s have increased risks of developing breast cancer compared to the women in their 20s. It is however worth noting that although the probability of breast cancer significantly rises with age, breast cancer is generally more aggressive in the younger individuals. The construction of the fuzzy logic system is a complex process that involves using appropriate membership functions and assigning relevant factors to the fuzzy logic input variables such as age, a measure of cancer infiltration in the adjacent cells and tumor surface area to determine the output variables (breast cancer risks). In the proposed fuzzy classification algorithm, the ranges of the ranges for both the input and output variables should be set in a way that they collaborate with the clinical findings. The breast cancer risk factors are then given in terms of % using the fuzzy logic such that 100% corresponds to high cancer risk while 0% corresponds to no cancer risk. Lastly, after defuzzification, the breast cancer breast risk factors computed by the fuzzy clustering algorithm can then be compared with clinical values such as Breast Imaging Reporting and Data System (BIRADS) scores (Pazdur, 2009, p.181). Statistical analysis of the efficiency of the proposed fuzzy clustering Algorithm To determine the efficiency of the algorithm, a breast cancer data set containing 32 attributes and 569 instances of breast cancer was obtained from a repository belonging to a machine –learning database. Out of the 569 instances of breast cancer recorded in the data, 212 were of malignant class while the rest 357 were of benign class. Figure1. Summary of the statistical details of the data set Class Frequency Percent Valid Percent Cumulative percent 1 212 62.7 62.7 62.7 2 357 37.3 37.3 100 Total 569 100 100 To determine the efficiency of the proposed fuzzy clustering algorithm that used mean and standard deviation of the attribute values to classify the data set, an analysis of the above breast cancer data base was carried out using the proposed approach. Using the membership function: A3*(xp3)A2*(xp4)=max {A1k(xp3).A2k(x p4) k=1,2}, the empirical results indicated the efficiency of the mean and standard deviation fuzzy classification method to be 92.2% as compared to the other radiological findings (Jain and Abraham 2003, p.516). The statistical analysis revealed that using a fuzzy rule based system has a relatively high classification ability and the algorithm can therefore be used to analyze breast cancer data and consequently in the prediction of breast cancer risks. Additionally fuzzy rule generation that uses mean and standard deviation can be easier to implement since the algorithm only depends on the mean and standard deviation of the given attribute values (Park and Sandberg, 1991, p.249).The performance of this kind of fuzzy clustering algorithm can however be improved further when the various rule parameters are optimized. Conclusion In conclusion, the proposed fuzzy logic classification system can be used to predict the risks of breast cancer based on the analysis of the attribute values of the data sets using specific fuzzy rules such as mean and standard deviation of the attribute values. Although the results may require further validation, using a fuzzy rule based system has a relatively high classification rate and the algorithm can therefore effectively be used to improve the diagnosis and evaluation of breast cancer risks through clustering of the particular data elements. References Bezdek, C.J. 1981. Pattern Recognition with Fuzzy Objective Function Algorithms. New York: Plenum. Castillo, O. & Melin, P. 2008. Type-2 Fuzzy Logic: Theory and Applications. Berlin: Springer- Heidelberg. DeSilva, C.J. 1994. Artificial Neural networks and Breast Cancer Prognosis” The Australian Computer Journal, (26)78-81. Dorf, R. C. & Robert H. B. 2001. Modern Control Systems . 9th ed. Upper Saddle River : Prentice Hall. Jain R. & Abraham A. 2003. A Comparative Study of Fuzzy Classifiers on Breast Cancer Data. Journal of Computer science (26) 512-519. Jemal A.B & Ferlay, M.M 2004. Global cancer statistics. a cancer journal for clinicians 61 (2): 69–90. Kleinsmith, Lewis J. 2006. Principles of cancer biology. Pearson: Benjamin Cummings publishers. Mendel, J.1995. Fuzzy logic systems for engineering: a tutorial. Proceedings of the IEEE, 83(3), p. 345-377. Mukherjee, S. 2010. The Emperor of All Maladies: A Biography of Cancer. New York: Simon and Schuster. Nguyen, H.T.& Walker, C. L. 2003. A First Course in Fuzzy and Neural Control. New York : Chapman & Hall. Park, J. & Sandberg, J.W. 1991. Universal approximation using radial basis functions network. Neural Computation (3)246-257. Pazdur, R. 2009. Cancer Management: A Multidisciplinary Approach. New York: Cmp United Business Media. Prieto, A. & Ortega, J. 2002. A new Clustering Technique for Function Approximation. IEEE Transactions on Neural Networks, 13(1) 132-142. Watanabe, H. 1994. The Application of a Fuzzy Discrimination Analysis for diagnosis of Valvular Heart Disease. IEEE trans. on Fuzzy Systems (6) 78-94. Read More

Fuzzy Logic with Data Mining with respect to Prediction and Clustering - Research Proposal Example

Extract of sample "Fuzzy Logic with Data Mining with respect to Prediction and Clustering"

CHECK THESE SAMPLES OF Fuzzy Logic with Data Mining with respect to Prediction and Clustering

The impact of economic globalization in the world

How did we come to be

Reality Television

Value Stream Mapping of Vibration Test Data in a Product Life Cycle

The Impact Of Economic Globalization In The World

Time Series Data Mining and Forecasting Using SQL Server 2008

Taxonomy on Existing Techniques of Reducing False Alarms in Sensor-Based Healthcare Monitoring

A Framework for Customer Relationship Management and Data Mining