Honesty in Causal Forests: When It Helps and When It Hurts

Jul-21-2025–arXiv.org Machine Learning

Causal forests have become a popular tool for estimating how treatment effects vary across individuals (Wager and Athey, 2018). They are used in a growing number of domains--including marketing, operations, economics, and public policy--to personalize interventions and inform targeting strategies. Since 2019, dozens of papers in INFORMS journals alone have applied causal forests to experimental or observational data (see Appendix C), often with the goal of estimating individual-level treatment effects. The method builds on a familiar idea: instead of estimating a single average effect for the whole population, we split the population into subgroups based on observed features and estimate effects within each group. This is conceptually similar to how random forests estimate outcomes, except now the goal is to estimate causal effects. But there is a crucial modeling difference: unlike random forests, which typically use the full training data for both splitting and estimation, causal forests often divide the training data in two--using one part to decide how to form the subgroups, and the other to estimate effects within them. This practice, known as honest estimation, is meant to prevent overfitting and selection bias (Athey and Imbens, 2016). It is the default in widely used software packages such as grf (Athey et al., 2019) and EconML (Battocchi et al., 2019), and is commonly recommended in applied research. But is this default always a good idea? 1

artificial intelligence, machine learning, variance, (19 more...)

arXiv.org Machine Learning

Jul-21-2025

arXiv.org PDF

Add feedback

Country:
- Asia > China
  - Hong Kong (0.04)
- North America > United States (0.14)

Genre:
- Research Report
  - Experimental Study (0.68)
  - New Finding (0.68)

Industry:
- Banking & Finance (0.67)
- Government (0.66)
- Health & Medicine > Therapeutic Area (1.00)
- Law (0.66)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Decision Tree Learning (0.54)
  - Ensemble Learning (0.54)
  - Statistical Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found