Merging versus Ensembling in Multi-Study Machine Learning: Theoretical Insight from Random Effects
Guan, Zoe, Parmigiani, Giovanni, Patil, Prasad
A critical decision point when training predictors using multiple studies is whether these studies should be combined or treated separately. We compare two multi-study learning approaches in the presence of potential heterogeneity in predictor-outcome relationships across datasets. We consider 1) merging all of the datasets and training a single learner, and 2) cross-study learning, which involves training a separate learner on each dataset and combining the resulting predictions. In a linear regression setting, we show analytically and confirm via simulation that merging yields lower prediction error than cross-study learning when the predictor-outcome relationships are relatively homogeneous across studies. However, as heterogeneity increases, there exists a transition point beyond which cross-study learning outperforms merging. We provide analytic expressions for the transition point in various scenarios and study asymptotic properties.
May-17-2019
- Country:
- Genre:
- Research Report (1.00)
- Industry:
- Health & Medicine
- Pharmaceuticals & Biotechnology (1.00)
- Therapeutic Area
- Oncology (1.00)
- Endocrinology > Diabetes (0.68)
- Health & Medicine
- Technology: