Frugal, Flexible, Faithful: Causal Data Simulation via Frengression
Yang, Linying, Evans, Robin J., Shen, Xinwei
The use of machine learning tools has given causal inference a new lease of life, enabling complex models to be used with principled causal estimators and guarantees about statistically important quantities (Wager and Athey, 2018; Chernozhukov et al., 2018; Hahn et al., 2020). To build trustworthy causal models, however, we also need to understand when these methods may be more or less reliable, or perhaps fail completely. This implies that causal inference needs a set of good benchmarking tools. Unfortunately, real-world datasets are not ideal for this task, because they cannot give us access to the ground truth other than in a few very special circumstances. In particular, they rarely provide the counterfactual outcomes we care about, and the distribution we want to evaluate often differs from the one that produced the observations. Well-designed simulations can address this discrepancy (Neal et al., 2020; Parikh et al., 2022); they allow us to choose a ground truth, stress-test new methods, compare their generalizability and stability, and expose failure modes before deployment.
Aug-5-2025
- Country:
- North America > United States
- New York > New York County
- New York City (0.04)
- Florida > Palm Beach County
- Boca Raton (0.04)
- New York > New York County
- Europe > United Kingdom
- England
- Oxfordshire > Oxford (0.04)
- Cambridgeshire > Cambridge (0.04)
- England
- North America > United States
- Genre:
- Research Report
- Experimental Study (1.00)
- Strength High (0.67)
- Research Report
- Industry: