C-mix: a high dimensional mixture model for censored durations, with applications to genetic data
Bussy, Simon, Guilloux, Agathe, Gaïffas, Stéphane, Jannot, Anne-Sophie
Predicting subgroups of patients with different prognosis is a key challenge for personalized medicine, see for instance Alizadeh et al. [2000] and Rosenwald et al. [2002] where subgroups of patients with different survival rates are identified based on gene expression data. A substantial number of techniques can be found in the literature to predict the subgroup of a given patient in a classification setting, namely when subgroups are known in advance [Golub et al., 1999, Hastie et al., 2001, Tibshirani et al., 2002]. We consider in the present paper the much more difficult case where subgroups are unknown. In this situation, a first widespread approach consists in first using unsupervised learning techniques applied on the covariates - for instance on the gene expression data [Bhattacharjee et al., 2001, Beer et al., 2002, Sørlie et al., 2001] - to define subsets of patients and then estimating the risks in each of them. The problem of such techniques is that there is no guarantee that the identified subgroups will have different risks.
Nov-25-2017
- Country:
- Europe
- France > Île-de-France
- Switzerland > Basel-City
- Basel (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- North America > United States
- Massachusetts > Middlesex County > Belmont (0.04)
- Europe
- Genre:
- Research Report > Experimental Study (0.68)
- Industry:
- Technology: