Retrieved from https://studentshare.org/statistics/1697627-regression-analysis-project
https://studentshare.org/statistics/1697627-regression-analysis-project.
Statistics Project: Statistics Regression Analysis Project PROBLEM 2 Model Specification The model to be used in this test is Logistic regression with the following model:P = kvWhere p is the probability that someone will be tested with prostate cancer, k is the coefficient of regression, and v is the most significant variable in determining the probability p. Outliers in this dataset include cancer, inv and cap since they have fixed and immeasurable quantities compared to the rest of the variables.
Transformation and important VariablesThe necessary transformation in this regression process is the binomial transformation using “mylogit” function. The most important variable in this case is c.vol, because the estimated volume of prostate cancer is the greatest determinant of whether a patient will test negative or positive (Howell, 2010). The best model, therefore, uses c to represent c.vol, in the regression model, which now becomes: P = kcThe test is accurate since it was done with a confidence interval of 97.5%.OutputThe code for this regression is found in the Appendix A section, while the output is found in Appendix B section.
The coefficient of logistic regression in this analysis for the c.vol is 0.80404, a highly positive coefficient, giving a probability (pr = 0.01539) that the patient will test positive. The second variable that indicates a positive test is psa, with a coefficient of 0.00226 a low positive coefficient, with a probability of 0.03847. The rest of the variables have negative association with the test for the hypothesis (Long, 1997).PredictionThis test estimates the cancer diagnosis for someone with 10 psa, 5 c.
vol, 40g for weight, age 67, with 2.5 benign, with no seminal vesicle invasion, and with 0.5 cm cap. The test is done through the same model and the results are found in Appendix C section. The coefficient of association is 1.46414, with a probability of 0.160296, hence the patient tests positive with prostate cancer.PROBLEM 3Model SpecificationThe model to be used in this test is Logistic regression with the following model:P = CvWhere p is the probability that gender or treatment has a significant effect on blood calcium level, C is the coefficient of regression, and v is the most significant variable in determining the probability p.
there is no outlier in this dataset.Transformation and important VariablesThe necessary transformation in this regression process is the binomial transformation using “mylogit” function. The two variables gender and treatment are both significant in determining the results of the hypothesis test (Hosmer & Lemeshow, 2000). The best model, therefore, combines the two variables in the regression as follows:P = C * g * T where C is the coefficient of regression, g is the gender and T is the Test.
The test is accurate since it was done with a confidence interval of 97.5%.Output and InterpretationThe code for this regression is found in the Appendix D section, while the output is found in Appendix E section. The coefficient of logistic regression in this analysis is 0.0206, a low positive coefficient, giving a probability (pr = 0.0247) that the Treatment significantly affects the blood calcium level. The second variable that indicates effect is the gender G, with a coefficient of 0.7004 a highly positive coefficient, with a probability of 0.
0139 that it will affect the blood calcium level.PredictionThe prediction of blood calcium level in this test is done with Gender as Male and treatment as T2. The results are shown in appendix F. The treatment T2 gives a coefficient of 0.00348 and a probability of 0.03744. Gender “Male” gives a coefficient of 0.0156 and probability of 0.414. The two variables both give positive coefficient and significant probabilities; hence the blood calcium level is calculated as follows:C = p / (g * T) C = 0.
22572 / (1 * 0.00954) = 23.6603ReferencesHosmer, D. & Lemeshow, S. (2000). Applied Logistic Regression (Second Edition). New York: John Wiley & Sons, Inc.Long, J. S. (1997). Regression Models for Categorical and Limited Dependent Variables. Thousand Oaks, CA: Sage PublicationsHowell, D. C. (2010). Statistical Methods for Psychology, 7th ed. Belmont, CA; Thomson WadsworthAPPENDIXAppendix Adf
Read More