C-mix: a high dimensional mixture model for censored durations, with applications to genetic data

Bussy, Simon, Guilloux, Agathe, Gaïffas, Stéphane, Jannot, Anne-Sophie

arXiv.org Machine Learning 

Predicting subgroups of patients with different prognosis is a key challenge for personalized medicine, see for instance Alizadeh et al. [2000] and Rosenwald et al. [2002] where subgroups of patients with different survival rates are identified based on gene expression data. A substantial number of techniques can be found in the literature to predict the subgroup of a given patient in a classification setting, namely when subgroups are known in advance [Golub et al., 1999, Hastie et al., 2001, Tibshirani et al., 2002]. We consider in the present paper the much more difficult case where subgroups are unknown. In this situation, a first widespread approach consists in first using unsupervised learning techniques applied on the covariates - for instance on the gene expression data [Bhattacharjee et al., 2001, Beer et al., 2002, Sørlie et al., 2001] - to define subsets of patients and then estimating the risks in each of them. The problem of such techniques is that there is no guarantee that the identified subgroups will have different risks.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found