Counterfactual Data Augmentation with Contrastive Learning

Aloui, Ahmed, Dong, Juncheng, Le, Cat P., Tarokh, Vahid

Nov-6-2023–arXiv.org Machine Learning

Statistical disparity between distinct treatment groups is one of the most significant challenges for estimating Conditional Average Treatment Effects (CATE). To address this, we introduce a model-agnostic data augmentation method that imputes the counterfactual outcomes for a selected subset of individuals. Specifically, we utilize contrastive learning to learn a representation space and a similarity measure such that in the learned representation space close individuals identified by the learned similarity measure have similar potential outcomes. This property ensures reliable imputation of counterfactual outcomes for the individuals with close neighbors from the alternative treatment group. By augmenting the original dataset with these reliable imputations, we can effectively reduce the discrepancy between different treatment groups, while inducing minimal imputation error. The augmented dataset is subsequently employed to train CATE estimation models. Theoretical analysis and experimental studies on synthetic and semi-synthetic benchmarks demonstrate that our method achieves significant improvements in both performance and robustness to overfitting across state-of-the-art models. One of the most significant challenges for Conditional Average Treatment Effect (CATE) estimation is the statistical disparity between distinct treatment groups (Goldsmith-Pinkham et al., 2022). While Randomized Controlled Trials (RCT) mitigate this issue (Rubin, 1974; Imbens & Rubin, 2015), they can be expensive, unethical, and sometimes unfeasible to conduct.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Machine Learning

Nov-6-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York (0.04)
  - Tennessee > Davidson County
    - Nashville (0.04)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)
- Asia > China
  - Anhui Province > Hefei (0.04)

Genre:
- Research Report > Experimental Study (1.00)

Industry:
- Health & Medicine (0.93)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning (0.94)
  - Neural Networks > Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found