Merging versus Ensembling in Multi-Study Machine Learning: Theoretical Insight from Random Effects

Guan, Zoe, Parmigiani, Giovanni, Patil, Prasad

May-17-2019–arXiv.org Machine Learning

A critical decision point when training predictors using multiple studies is whether these studies should be combined or treated separately. We compare two multi-study learning approaches in the presence of potential heterogeneity in predictor-outcome relationships across datasets. We consider 1) merging all of the datasets and training a single learner, and 2) cross-study learning, which involves training a separate learner on each dataset and combining the resulting predictions. In a linear regression setting, we show analytically and confirm via simulation that merging yields lower prediction error than cross-study learning when the predictor-outcome relationships are relatively homogeneous across studies. However, as heterogeneity increases, there exists a transition point beyond which cross-study learning outperforms merging. We provide analytic expressions for the transition point in various scenarios and study asymptotic properties.

learner, ridge regression, transition point, (14 more...)

arXiv.org Machine Learning

May-17-2019

arXiv.org PDF

Add feedback

Country:
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre:
- Research Report (1.00)

Industry:
- Health & Medicine
  - Pharmaceuticals & Biotechnology (1.00)
  - Therapeutic Area
    - Oncology (1.00)
    - Endocrinology > Diabetes (0.68)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found