misclassification matrix
Regression on imperfect class labels derived by unsupervised clustering
Brøndum, Rasmus Froberg, Michaelsen, Thomas Yssing, Bøgsted, Martin
In biomarker studies it is popular to perform an unsupervised clustering of high-dimensional variables like genome wide screens of SNPs, gene expressions, and protein data and regress for example treatment response, patient recorded outcome measures, time to disease progression, or overall survival on these potentially mislabelled clusters. It is well-known from the statistical literature that errors in continuous and categorical covariates can lead to loss of important information about effects on outcome (Carroll et al., 2006). However, to our surprise this is often ignored when regressing outcome on classes identified by unsupervised learning, which might lead to important clinical effect measures being overlooked (Alizadeh et al., 2000; Veer et al., 2002; Guinney et al., 2015; Zhan et al., 2006; Broyl et al., 2010). We suggest to cast the problem as a covariate misclassification problem. This leaves us with a concourse of possible modelling and analysis options, see for example the book by Carroll et al. (2006) or the recent review by Brakenhoff et al. (2018).
- Europe > Denmark > North Jutland > Aalborg (0.05)
- North America > United States > Florida > Palm Beach County > Boca Raton (0.05)
- North America > United States > New York (0.04)
- (2 more...)
- Research Report > Experimental Study (0.95)
- Research Report > New Finding (0.69)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Epidemiology (0.66)