AITopics

2606.18969

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.67)

Industry:

Energy > Power Industry (0.34)
Health & Medicine > Health Care Providers & Services (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Neural Information Processing SystemsJun-22-2026, 17:32:09 GMT

PUATE: Efficient ATEEstimation from Treated (Positive)and Unlabeled Units

The estimation of average treatment effects (ATEs), defined as the difference in expected outcomes between treatment and control groups, is a central topic in causal inference. This study develops semiparametric efficient estimators for ATE in a setting where only a treatment group and an unlabeled group--consisting of units whose treatment status is unknown--are observed. This scenario constitutes a variant of learning from positive and unlabeled data (PU learning) and can be viewed as a special case of ATE estimation with missing data. For this setting, we derive the semiparametric efficiency bounds, which characterize the lowest achievable asymptotic variance for regular estimators. We then construct semiparametric efficient ATE estimators that attain these bounds. Our results contribute to the literature on causal inference with missing data and weakly supervised learning.

artificial intelligence, data mining, machine learning, (20 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Law (0.71)
Education (0.67)
Health & Medicine > Therapeutic Area (0.45)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)
(2 more...)

Dhawan, Nikita, Paruthi, Arnav, Kim, Andrew, Gondara, Lovedeep, Novikova, Jekaterina, Maddison, Chris J.

Causal Risk Minimization for High-Dimensional Treatments

arXiv.org Machine LearningMay-27-2026

Predicting the effect of interventions with many possible variations, e.g., therapeutic content that affects mental health outcomes or an earnings call transcript that drives movement in share price, is useful across several domains. However, classical causal estimators tend to assume that all possible interventions are observed, which is infeasible when interventions vary widely, for instance, in the space of all text strings. We adapt a well-known approach of recasting causal inference as a learning problem, to address high-dimensional treatment spaces. Specifically, under standard assumptions like no unobserved confounding, we show that causal error decomposes into a series of moment-balancing errors of increasing order, and design objectives that directly improve causal estimation. We also show how to project the effect of a high-dimensional treatment onto lower-dimensional treatment attributes, which allows a single model to answer several causal questions without additional attribute-specific training. We empirically evaluate our estimators in settings with high-dimensional continuous, discrete, and text treatments, the last of which used a semi-synthetic dataset of Amazon Reviews. Our experiments demonstrate the benefit of higher-order balance error optimization and competitive performance of projected causal estimates with attribute-specific estimators.

artificial intelligence, machine learning, natural language, (16 more...)

2605.27281

Country: North America > Canada > Ontario > Toronto (0.15)

Genre: Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
Health & Medicine > Therapeutic Area > Immunology (0.46)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Bertolini, Daniele, Fuller, Franklin, Smith, Aaron M., Walsh, Jonathan R., Zhuang, Run

Digital Twins as Synthetic Controls in Single-Arm Trials

arXiv.org Machine LearningMay-14-2026

Single-arm trials are an important study design for evaluating drug efficacy and safety without enrolling patients into a control arm. Although they do not provide the gold-standard evidence of randomized controlled trials, they are increasingly used in clinical development as they offer an efficient, ethical, and practical alternative. A wide variety of approaches can be used to construct control comparators and estimate treatment effects, from fixed comparators informed by clinical knowledge to data-based and model-based patient-level comparators, also known as synthetic controls. Powerful and flexible machine learning models can allow outcome-model-based synthetic controls to overcome key limitations of direct data-based approaches, yield more robust estimates of treatment effects, and provide a principled way to incorporate corrections or encode additional assumptions when external data are not directly comparable. In this work, we argue that outcome-model-based synthetic control arms are an important tool for single-arm trials. We focus on digital twins, personalized predictions of disease progression generated from machine learning models trained on historical datasets, which naturally leverage these flexible approaches. We review doubly robust estimators, present power and sample size formulas, and discuss trade-offs in selecting historical data for training and analysis. We also outline practical considerations for deploying digital twins within the framework of recent FDA draft guidance on the use of artificial intelligence in drug development. Finally, we reanalyze data from trials in amyotrophic lateral sclerosis and Huntington's disease to demonstrate the proposed methods.

artificial intelligence, estimator, machine learning, (18 more...)

2605.12832

Country: North America > United States > California > San Francisco County > San Francisco (0.86)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.68)
Government > Regional Government > North America Government > United States Government > FDA (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Bansak, Kirk, Paulson, Elisabeth, Rothenhäusler, Dominik, Ferwerda, Jeremy, Hainmueller, Jens, Hotard, Michael

Robustness of Refugee-Matching Gains to Off-Policy Evaluation Choices

arXiv.org Machine LearningMay-11-2026

Previous research has investigated the potential of refugee matching for boosting refugee outcomes, first considered by Bansak et al. (2018). This paper demonstrates the stability of counterfactual impact evaluation results in the context of refugee matching in the United States using a range of off-policy evaluation methods. In order to estimate counterfactual impact and test the robustness of our results, we employ several evaluation methods, including inverse probability weighting (IPW) and multiple variants of augmented inverse probability weighting (AIPW). We also consider various modifications, including alternative modeling architectures and different assignment procedures. The impact estimates remain consistent in magnitude in all scenarios as well as statistically significant in most cases. Furthermore, the estimates are also consistent with the results originally presented in Bansak et al. (2018).

artificial intelligence, assignment, machine learning, (14 more...)

2605.06686

Country: North America > United States (0.49)

Genre: Research Report > New Finding (0.66)

Industry:

Government > Immigration & Customs (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.90)
Government > Regional Government (0.90)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Yang, Tianyu, Noor-E-Alam, Md.

A Novel Computational Framework for Causal Inference: Tree-Based Discretization with ILP-Based Matching

arXiv.org Machine LearningMay-8-2026

Causal inference is essential for data-driven decision-making, as it aims to uncover causal relationships from observational data. However, identifying causality remains challenging due to the potential for confounding and the distinction between correlation and causation. While recent advances in causal machine learning and matching algorithms have improved estimation accuracy, these methods often face trade-offs between interpretability and computational efficiency. This paper proposes a novel approach that combines a tree-based discretization technique, tailored for causal inference, with an integer linear programming-based matching algorithm. The discretization ensures approximately linear relationships for control datasets within strata, enabling effective matching, while the optimization framework optimizes for global balance. The resulting algorithm yields computational efficiency and less biased ATT estimates compared to state-of-the-art algorithms. Empirical evaluations demonstrate the proposed method's practical advantages over existing techniques in causal inference scenarios.

artificial intelligence, machine learning, optimization problem, (18 more...)

2604.27307

Country: North America > United States (0.68)

Genre:

Research Report > Experimental Study (1.00)
Research Report > Strength High (0.68)

Industry:

Health & Medicine > Therapeutic Area (0.68)
Health & Medicine > Public Health (0.46)
Health & Medicine > Epidemiology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Neural Information Processing SystemsApr-25-2026, 00:29:14 GMT

Optimal Transport for Treatment Effect Estimation

Estimating conditional average treatment effect from observational data is highly challenging due to the existence of treatment selection bias. Prevalent methods mitigate this issue by aligning distributions of different treatment groups in the latent space. However, there are two critical problems that these methods fail to address: (1) mini-batch sampling effects (MSE), which causes misalignment in non-ideal mini-batches with outcome imbalance and outliers; (2) unobserved confounder effects (UCE), which results in inaccurate discrepancy calculation due to the neglect of unobserved confounders. To tackle these problems, we propose a principled approach named Entire Space CounterFactual Regression (ESCFR), which is a new take on optimal transport in the context of causality. Specifically, based on the framework of stochastic optimal transport, we propose a relaxed masspreserving regularizer to address the MSE issue and design a proximal factual outcome regularizer to handle the UCE issue. Extensive experiments demonstrate that our proposed ESCFR can successfully tackle the treatment selection bias and achieve significantly better performance than state-of-the-art methods.

artificial intelligence, data mining, machine learning, (18 more...)

Country: North America > United States (0.67)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science > Data Mining (0.68)

Neural Information Processing SystemsApr-24-2026, 17:29:38 GMT

Online Multi-Armed Bandits with Adaptive Inference

During online decision making in multi-armed bandits, one needs to conduct inference on the true mean reward of each arm based on data collected so far at each step. However, since the arms are adaptively selected-thereby yielding non-i.i.d.

artificial intelligence, data mining, machine learning, (20 more...)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine (0.33)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsApr-24-2026, 11:08:30 GMT

0918183ced31affb7ce0345e45ac1943-Supplemental-Conference.pdf

We evaluate Okapi using three datasets - iWildCam, PovertyMap, and CivilComments - taken from the WILDS 2.0 benchmark [63]. These datasets were chosen specifically due to the poor performance reported by [63] for semi-supervised and domain adaptation methods across the board, in relation to the ERM baselines. For PovertyMap in particular, ERM was found to vastly outperform any competing methods utilising the unlabelled data and/or domain labels. The task is multiclass species classification of animals in camera trap images. The dataset contains 1022K images of animals annotated with the domain, s, that identifies the camera trap that captured it.

artificial intelligence, encoder, machine learning, (16 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)