Goto

Collaborating Authors

 transportability



GeneralTransportabilityofSoftInterventions: CompletenessResults

Neural Information Processing Systems

In the causal inference literature, this generalization task has been formalized under the rubric oftransportability (Pearl and Bareinboim, 2011), where a number of criteria and algorithms have been developed forvarious settings.


Transportability for Bandits with Data from Different Environments

Neural Information Processing Systems

A unifying theme in the design of intelligent agents is to efficiently optimize a policy based on what prior knowledge of the problem is available and what actions can be taken to learn more about it. Bandits are a canonical instance of this task that has been intensely studied in the literature. Most methods, however, typically rely solely on an agent's experimentation in a single environment (or multiple closely related environments). In this paper, we relax this assumption and consider the design of bandit algorithms from a combination of batch data and qualitative assumptions about the relatedness across different environments, represented in the form of causal models. In particular, we show that it is possible to exploit invariances across environments, wherever they may occur in the underlying causal model, to consistently improve learning. The resulting bandit algorithm has a sub-linear regret bound with an explicit dependency on a term that captures how informative related environments are for the task at hand; and may have substantially lower regret than experimentation-only bandit instances.


General Transportability of Soft Interventions: Completeness Results

Neural Information Processing Systems

The challenge of generalizing causal knowledge across different environments is pervasive in scientific explorations, including in AI, ML, and Data Science. Experiments are usually performed in one environment (e.g., in a lab, on Earth) with the intent, almost invariably, of being used elsewhere (e.g., outside the lab, on Mars), where the conditions are likely to be different. In the causal inference literature, this generalization task has been formalized under the rubric of transportability (Pearl and Bareinboim, 2011), where a number of criteria and algorithms have been developed for various settings. Despite the generality of such results, transportability theory has been confined to atomic, do()-interventions. In practice, many real-world applications require more complex, stochastic interventions; for instance, in reinforcement learning, agents need to continuously adapt to the changing conditions of an uncertain and unknown environment. In this paper, we extend transportability theory to encompass these more complex types of interventions, which are known as soft, both relative to the input as well as the target distribution of the analysis. Specifically, we develop a graphical condition that is both necessary and sufficient for deciding soft-transportability. Second, we develop an algorithm to determine whether a non-atomic intervention is computable from a combination of the distributions available across domains. As a corollary, we show that the $\sigma$-calculus is complete for the task of soft-transportability.


Causal Judge Evaluation: Calibrated Surrogate Metrics for LLM Systems

Landesberg, Eddie

arXiv.org Machine Learning

LLM-as-judge evaluation has become the de facto standard for scaling model assessment, but the practice is statistically unsound: uncalibrated scores can invert preferences, naive confidence intervals on uncalibrated scores achieve near-0% coverage, and importance-weighted estimators collapse under limited overlap despite high effective sample size (ESS). We introduce Causal Judge Evaluation (CJE), a framework that fixes all three failures. On n=4,961 Chatbot Arena prompts (after filtering from 5k), CJE achieves 99% pairwise ranking accuracy at full sample size (94% averaged across configurations), matching oracle quality, at 14x lower cost (for ranking 5 policies) by calibrating a 16x cheaper judge on just 5% oracle labels (~250 labels). CJE combines three components: (i) AutoCal-R, reward calibration via mean-preserving isotonic regression; (ii) SIMCal-W, weight stabilization via stacking of S-monotone candidates; and (iii) Oracle-Uncertainty Aware (OUA) inference that propagates calibration uncertainty into confidence intervals. We formalize the Coverage-Limited Efficiency (CLE) diagnostic, which explains why IPS-style estimators fail even when ESS exceeds 90%: the logger rarely visits regions where target policies concentrate. Key findings: SNIPS inverts rankings even with reward calibration (38% pairwise, negative Kendall's tau) due to weight instability; calibrated IPS remains near-random (47%) despite weight stabilization, consistent with CLE; OUA improves coverage from near-0% to ~86% (Direct) and ~96% (stacked-DR), where naive intervals severely under-cover.


Transportability from Multiple Environments with Limited Experiments

Elias Bareinboim, Sanghack Lee, Vasant Honavar, Judea Pearl

Neural Information Processing Systems

This paper considers the problem of transferring experimental findings learned from multiple heterogeneous domains to a target domain, in which only limited experiments can be performed. We reduce questions of transportability from multiple domains and with limited scope to symbolic derivations in the causal calculus, thus extending the original setting of transportability introduced in [1], which assumes only one domain with full experimental information available. We further provide different graphical and algorithmic conditions for computing the transport formula in this setting, that is, a way of fusing the observational and experimental information scattered throughout different domains to synthesize a consistent estimate of the desired effects in the target domain. We also consider the issue of minimizing the variance of the produced estimand in order to increase power.




Transportability from Multiple Environments with Limited Experiments

Elias Bareinboim, Sanghack Lee, Vasant Honavar, Judea Pearl

Neural Information Processing Systems

This paper considers the problem of transferring experimental findings learned from multiple heterogeneous domains to a target domain, in which only limited experiments can be performed. We reduce questions of transportability from multiple domains and with limited scope to symbolic derivations in the causal calculus, thus extending the original setting of transportability introduced in [1], which assumes only one domain with full experimental information available. We further provide different graphical and algorithmic conditions for computing the transport formula in this setting, that is, a way of fusing the observational and experimental information scattered throughout different domains to synthesize a consistent estimate of the desired effects in the target domain. We also consider the issue of minimizing the variance of the produced estimand in order to increase power.


Prediction of Survival Outcomes under Clinical Presence Shift: A Joint Neural Network Architecture

Jeanselme, Vincent, Martin, Glen, Sperrin, Matthew, Peek, Niels, Tom, Brian, Barrett, Jessica

arXiv.org Artificial Intelligence

Electronic health records arise from the complex interaction between patients and the healthcare system. This observation process of interactions, referred to as clinical presence, often impacts observed outcomes. When using electronic health records to develop clinical prediction models, it is standard practice to overlook clinical presence, impacting performance and limiting the transportability of models when this interaction evolves. We propose a multi-task recurrent neural network that jointly models the inter-observation time and the missingness processes characterising this interaction in parallel to the survival outcome of interest. Our work formalises the concept of clinical presence shift when the prediction model is deployed in new settings (e.g. different hospitals, regions or countries), and we theoretically justify why the proposed joint modelling can improve transportability under changes in clinical presence. We demonstrate, in a real-world mortality prediction task in the MIMIC-III dataset, how the proposed strategy improves performance and transportability compared to state-of-the-art prediction models that do not incorporate the observation process. These results emphasise the importance of leveraging clinical presence to improve performance and create more transportable clinical prediction models.