StudentShare
Contact Us
Sign In / Sign Up for FREE
Search
Go to advanced search...
Free

A Heteroscedastic Regression Model for Survival Analysis - Statistics Project Example

Cite this document
Summary
"Heteroscedastic Regression Model for Survival Analysis" paper studies the rates of survival for specific cancer suitable covariates relevant to that cancer identified. One ought to use linear combinations of such covariates with coefficients as in the multiple regressions…
Download full paper File format: .doc, available for editing
GRAB THE BEST PAPER98.8% of users find it useful
A Heteroscedastic Regression Model for Survival Analysis
Read Text Preview

Extract of sample "A Heteroscedastic Regression Model for Survival Analysis"

Heteroscedastic Regression Model for Survival Analysis Introduction Regression modeling of the relationship between an outcome variable and one or more independent variable is very popular and applicable in virtually all fields. This approach is popular because of the fact that plausible models may be easily fit, evaluated and interpreted. Statistically, the specifications of a model require choosing both systematic and error components. The choice made on which kind of systematic component involves an assessment of the relationship among the average of the outcome variable relative to specific levels of the independent variable. An exploratory analysis of the current data or experience can create the lead. The choice of an error component involves specifying the statistical distribution of what remains for explanation after the model is fit. One might find interest on the impacts of covariates on survival probabilities. For example, a variety of cancer diseases goes hand in hand with age thereby making the age of those who suffer from cancer a covariate. What we eat and the kind of lifestyle we live is what causes cancer amongst us. Hence, the bridge between cancer and smoking is common knowledge to us and similar is the exposure to asbestos. Taking liquor affects the chances of a person getting certain cancers of the mouth, throat or even esophagus. The kind of diet we live in determines our chances of getting breast cancer. In a way, women whose diets consist mainly of dairy products are more viable to cancer compared to women who take fish and fish. Foods with inadequate fiber and too much consumption of red meat make one susceptible to cancer of the bowel. Thus, while studying the rates of survival for specific cancer suitable covariates relevant to that cancer identified. One ought to use linear combinations of such covariates with coefficients as in the multiple regressions. One might also evaluate this hazard on its baseline. The study revolves around a particular illness where the patients may die, halt treatment or leave the study. The use of mixture models has been on for more than a century. They encompass a multi dimensional and extensible model class for approximating overall distribution function in a semi-parametric dimension and this makes the modeling technique a popular technique accounting for a relevant diversity. There has been an increase in applications as model estimation has become feasible with the nowadays easily computing power in the previous ten years. The simplest finite model mixtures are those finite mixtures for distribution, used for model-based clustering. Therefore, the model takes the form of a combination of finite numbers with different distribution where each distribution referred to as component. The insertion of different types of models has led to the development of more complex mixtures. An obvious extension is to estimate generalized linear model for each component. Finite mixtures of GLMs allow settling the assumption that the regression coefficients and dispersion parameters are similar observations. Contrary to mixed effects models, where it is assumed that the distributions of the parameters of the observations is known, finite mixture models do not require to specify this distribution a-prior but allow to approximate in the data given away. Model Specification In the standard linear model, the dependant variable takes assumption of following a Gaussian distribution where the mean value is determined through a linear relationship. The assumption that the dependant variables follow a Gaussian distribution is relaxed in the generalized linear model framework. The distribution of the dependant variable assumed to be from the exponential family distributions. This enables collection of certain data characteristics into certain accounts such as that the dependant variable is for example the counting variable with values, generally assumed to follow a poisoned distribution. For a Gaussian distribution the identity function is the conical link, for the poison the log function and for the gamma distribution the reciprocal function. The GLM framework is embedded in the finite mixture framework by placing GLMs in the components. Several special cases and extension of this model class exist. The component specific densities are from the same paramedic family for each component. For notational simplicity and the link, function is also the same for all the components. In cluster wise regression settings, this will be an obvious model choice as no a priori knowledge about differences in distributional families of the components is available. Another popular extension is to have a so-called concomitant variable model for the prior classes probabilities, such that the also depend on a set of explanatory variables. A special case where different component specific distribution are used is a model where only a single component is specified to follow a different distribution in order to allow this component to capture outlying observation. This approach is similar to the specification of zero inflated models. Even though the component specific densities originate from the same parametric family, the parameters fixed a-prior for one component such that this component absorbs all excess zeros in the zero inflated models. In order to decrease the number of parameters equality constraints over the components for a subset of the component specific parameters require application. A special case is random intercept models where the only intercept follows a finite mixture distribution while all the other regression coefficients are constant over the components. These models are often used in the over-dispersion is encountered in Poisson and binomial GLMs in order to determine a model which describes the data in an appropriate way. Parameter vectors in general represent statistical models. For finite mixture models, the parameter vector, which consists of the components weights and the component specific parameter, determines. The proof is straight forward given the previous results for finite mixtures and standard linear regression by Henning (2000) and finite mixture of GLMs and multinomial logic models with varying and fixed effects in the regression coefficients by Grun (2006), Grun and Leisch (2007). For condition the generic identifiability of finite mixtures with the given component specific distribution is Gaussian, Poisson, or gamma this condition is no restriction as mixtures of these distributions are generically identifiable. In the case of the binomial distribution, the repetition parameter has to undergo scrutiny for each observation in order to determine if it can be included in. Condition indicates that for each individual there has to be one of the hyper-planes through the origin, which covers all identifiable observations of this individual. The rank condition ensures that the regression coefficients portray a unique linear predictor. These conditions indicate that identifiability problems can especially occur if the covariate matrix contains categorical variables. We refer to identifiability problems due to the violation of the coverage condition as Inter-component label switching: If labels are fixed in one covariate point according to some ordering constraint, then labels may switch in other covariate points for different parameterizations of the model. For mixture where component distribution is identifiable, this means that the components weights and possible dispersion parameters are unique, but the regression coefficients vary because they depend on the combination of the components between the covariate points. This identifiable problem is also of concern for prediction, because given the class membership the predicted value for new data depends on the solution. Unidentified mixture model with several isolated non-trivial modes in the likelihood are to some extent more of a theoretical problem, because, Minimal changes of component weights often make the model identified by breaking symmetry. However, models close to an unidentified will have multiple local modes. The following example presents a simple mixture of regression models with intra-component label switching. The model is unidentified only if both components have exactly the same probability. Survival Analysis Survival analysis involves the analysis of time to an event occurrence. Take for instance an event in which the times are random perhaps being seconds or days. Perhaps the time can be until a generator seizes to function, a patient suffering from cancer dies or the time until unemployed individual finds employment. The assumption towards an events normal time is not reasonable to many events. It is not reasonable, for instance, to put into thinking an event susceptible to the risk of occurrence. Time distribution will follow an exponential distribution. It is also unreasonable if we are analyzing survival times in regards to a specific vital surgery. Then the distribution might have two modes; many patients may succumb to death after surgery but incase the patient survive, chances of the ailment returning are high. One other problem is that a time to failure is always positive, while theoretically, the normal distribution in the entire real line requires total support. However, in reality this fact alone does not fit to render the distribution useless. At its peak, survival analysis only concerns nothing more than making a substitution for the normality assumption with something more appropriate at hand. Maybe if we were relevant to the survival analysis, when asked why not linear regression, you would answer by giving an excuse of right of censoring. We can fix linear regression easily enough to deal with right censoring. It goes under the name censored normal regression. The real problem with linear regression in survival application is with the assumed normality. Being unfamiliar with survival analysis, you might be tempted to use linear regression in the face of normality. Linear regression becomes remarkably robust to deviations from normality, so why not just use it anyway. The problem is the distribution for the time to an event might be dissimilar from the normal they are almost certainly nonsymmetrical, they might be bimodal, and linear regression is not robust to this violations. However, we do not want to get lost in all the mathematical details. We could have done each of the analyses using whatever binary analysis method seemed appropriate. By doing so, we could combine them all if we are sufficiently clever in doing the math, and because each of the separate analysis made no assumption about the distribution of failure times, the combined analysis makes no such assumption. The last statement is rather slippery, so it does not hurt to verify its truth we have been considering the data. These two alternatives have dramatically different distribution for time, yet they have the same temporal ordering the same values of x. think about performing the individual analyses on each of these datasets, and you will realize that the results you get will be the same. Time plays no role other than ordering the observations. The method described above goes under the same semi parametric analysis as far as time is concerned, they are nonparametric, but because we are still parameterizing the effect of x, there exist a parametric component to the analysis. Nonparametric Analysis Semi parametric models are parametric in the sense that the effects of the covariates take a certain form. By performing separate analysis at each failure time and concerning ourselves only with the order with which the failure occur, we make no assumption about the distribution of time to failure. We can however assume that each subject observed, determined the probability that the subject will fail. An entirely non-parametric assumption will be to go away with the assumption and to follow the philosophy of letting the dataset speak for itself. There exists a vast literature on performing non-parametric regression using methods such as local polynomial regression; however, such methods do not adequately deal with censoring and other issues unique to survival data. When covariates seize to exist or when they take a qualitative form in nature, we can use nonparametric measures such as Kaplan and Meir or Nelson and Aalen’s method of estimating the probability of survival past a certain time to create a comparison for each gender’s survival experiences. These methods account for censoring and other characteristics of survival data. There also exist methods such as the two-sample log-rank tests, which can compare the survival experience across gender by using only the temporal ordering of the failure times. Nonparametric methods make assumption about neither the distribution of the failure times nor how covariates serve to shift or otherwise change the survival experience. References Cleves, M. A. (2008). An introduction to survival analysis using Stata. College Station, Tex: Stata Press. Kragh, A. P., Væth, M., & Københavns University. (2009). Survival analysis. Copenhagen: Department of Biostatistics, University of Copenhagen. Kleinbaum, D. G., & Klein, M. (2012). Survival analysis. New York [etc.: Springer. Graßhoff, U., Holling, H., & Schwabe, R. (2009). On optimal design for a Heteroscedastic, model arising from random coefficients. Magdeburg: Univ., Fak. für Mathematik. Wilcox, R. R. (2012). Modern statistics for the social and behavioral sciences: A practical introduction. Boca Raton, Fla: CRC Press. Read More
Cite this document
  • APA
  • MLA
  • CHICAGO
(A Heteroscedastic Regression Model for Survival Analysis Statistics Project Example | Topics and Well Written Essays - 2000 words, n.d.)
A Heteroscedastic Regression Model for Survival Analysis Statistics Project Example | Topics and Well Written Essays - 2000 words. https://studentshare.org/statistics/1823050-a-heteroscedastic-regression-model-for-survival-analysis
(A Heteroscedastic Regression Model for Survival Analysis Statistics Project Example | Topics and Well Written Essays - 2000 Words)
A Heteroscedastic Regression Model for Survival Analysis Statistics Project Example | Topics and Well Written Essays - 2000 Words. https://studentshare.org/statistics/1823050-a-heteroscedastic-regression-model-for-survival-analysis.
“A Heteroscedastic Regression Model for Survival Analysis Statistics Project Example | Topics and Well Written Essays - 2000 Words”. https://studentshare.org/statistics/1823050-a-heteroscedastic-regression-model-for-survival-analysis.
  • Cited: 0 times

CHECK THESE SAMPLES OF A Heteroscedastic Regression Model for Survival Analysis

How to Perform and Interpret Regression Analysis

An essay "How to Perform and Interpret Regression analysis" outlines that regression analysis is a statistical tool for the investigation of relationships between variables.... How to Perform and Interpret Regression AnalysisRegression analysis is a statistical tool for the investigation of relationships between variables.... There are two types of regression analysis namely;Simple Regression: Involves two variables, the dependent variable and one independent variable....
2 Pages (500 words) Essay

The Gold Rush in California

In the essay “Native survival” the author discusses the issue that for the indigenous people living in California as the Gold Rush commenced, survival was more than a matter of finding food and shelter or overcoming the disease.... However, those that survived did so through intelligence and perseverance, as they determined to fight for their survival....
2 Pages (500 words) Essay

Describing data statistically: Association, regression, and correlation

In fact, this is the most common practice especially in exploratory data analysis in which a specific pattern is observed and then tested through inferences of an observed data.... One of the most common statistical tools used in defining relationship between variables is regression analysis.... Regression analysis comes in different forms depending on the nature or type of data to be tested.... In this analysis, it is important to consider...
5 Pages (1250 words) Assignment

Proposal of literature review of proportional hazards model

survival analysis involves Proportional hazards model The proportional hazards model, which was proposed by Cox in 1972, has been adoptedprimarily in medical testing analysis to model the effect of secondary variables on survival (Schoenfeld 499).... survival analysis involves examination and modeling the time it takes for events to occur.... The survival library in R and S-PLUS also holds all of the other commonly used tools of survival analysis....
2 Pages (500 words) Research Paper

The Effects of Depression on the Progression of HIV

Therefore, the model tends to be sided in nature.... The paper "The Effects of Depression on the Progression of HIV" describes that the authors tend to concentrate on depression as a major cause of HIV progression.... The implication is that only one cause of the HIV virus progression....
3 Pages (750 words) Essay

Crime Statistics and Regression Model

This will provide the necessary background information and a better understanding of what has been done before and what further work needs to be done.... A literature… iew provides information on conflicting information in different studies and may also provide information on the reliability of the information obtained on the basis of the data used and the methodology employed in carrying out the research studies....
7 Pages (1750 words) Research Paper

Regression Analysis

The following assignment provides the regression analysis of labor.... hellip; The author of the analysis presents certain results of the research.... Regression analysis.... Regression analysis.... The received data shows that an addition of 10000 non-labor income will reduce the number of female in labor participation by (0....
1 Pages (250 words) Assignment

Five Regression Models

% of the variation in STR is explained by the explanatory variable (EL_PCT) in the model and also the estimated coefficient is 0.... Use these regression results to explain why you should not be surprised by what happened to the estimated coefficient on EL_PCT when you switched from model #2 to model #3.... % of the variation in EL_PCT is explained by the explanatory variable (MEAL_PCT) in the model and also the estimated coefficient is 0....
6 Pages (1500 words) Assignment
sponsored ads
We use cookies to create the best experience for you. Keep on browsing if you are OK with that, or find out how to manage cookies.
Contact Us