Frugal, Flexible, Faithful: Causal Data Simulation via Frengression

Yang, Linying, Evans, Robin J., Shen, Xinwei

Aug-5-2025–arXiv.org Machine Learning

The use of machine learning tools has given causal inference a new lease of life, enabling complex models to be used with principled causal estimators and guarantees about statistically important quantities (Wager and Athey, 2018; Chernozhukov et al., 2018; Hahn et al., 2020). To build trustworthy causal models, however, we also need to understand when these methods may be more or less reliable, or perhaps fail completely. This implies that causal inference needs a set of good benchmarking tools. Unfortunately, real-world datasets are not ideal for this task, because they cannot give us access to the ground truth other than in a few very special circumstances. In particular, they rarely provide the counterfactual outcomes we care about, and the distribution we want to evaluate often differs from the one that produced the observations. Well-designed simulations can address this discrepancy (Neal et al., 2020; Parikh et al., 2022); they allow us to choose a ground truth, stress-test new methods, compare their generalizability and stability, and expose failure modes before deployment.

artificial intelligence, frengression, machine learning, (18 more...)

arXiv.org Machine Learning

Aug-5-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York > New York County
    - New York City (0.04)
  - Florida > Palm Beach County
    - Boca Raton (0.04)
- Europe > United Kingdom
  - England
    - Oxfordshire > Oxford (0.04)
    - Cambridgeshire > Cambridge (0.04)

Genre:
- Research Report
  - Experimental Study (1.00)
  - Strength High (0.67)

Industry:
- Information Technology (0.67)
- Health & Medicine
  - Pharmaceuticals & Biotechnology (0.92)
  - Therapeutic Area
    - Cardiology/Vascular Diseases (1.00)
    - Endocrinology (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Uncertainty (0.67)
  - Machine Learning
    - Neural Networks > Deep Learning (1.00)
    - Statistical Learning (0.67)
    - Learning Graphical Models > Directed Networks
      - Bayesian Learning (0.45)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found