Machine learning the first stage in 2SLS: Practical guidance from bias decomposition and simulation
Lennon, Connor, Rubin, Edward, Waddell, Glen
Machine learning (ML) primarily evolved to solve "prediction problems." The first stage of two-stage least squares (2SLS) is a prediction problem, suggesting potential gains from ML first-stage assistance. However, little guidance exists on when ML helps 2SLS$\unicode{x2014}$or when it hurts. We investigate the implications of inserting ML into 2SLS, decomposing the bias into three informative components. Mechanically, ML-in-2SLS procedures face issues common to prediction and causal-inference settings$\unicode{x2014}$and their interaction. Through simulation, we show linear ML methods (e.g., post-Lasso) work well, while nonlinear methods (e.g., random forests, neural nets) generate substantial bias in second-stage estimates$\unicode{x2014}$potentially exceeding the bias of endogenous OLS.
May-20-2025
- Country:
- Asia
- China (0.04)
- Philippines (0.04)
- Europe > United Kingdom
- England > Oxfordshire > Oxford (0.04)
- North America > United States
- California (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- New Jersey > Mercer County
- Princeton (0.04)
- Oregon > Lane County
- Eugene (0.04)
- Asia
- Genre:
- Instructional Material > Training Manual (0.40)
- Research Report (1.00)
- Industry:
- Banking & Finance > Economy (0.67)
- Technology: