doubly robust estimator
Handling Missing Responses under Cluster Dependence with Applications to Language Model Evaluation
Human annotations play a crucial role in evaluating the performance of GenAI models. Two common challenges in practice, however, are missing annotations (the response variable of interest) and cluster dependence among human-AI interactions (e.g., questions asked by the same user may be highly correlated). Reliable inference must address both issues to achieve unbiased estimation and appropriately quantify uncertainty when estimating average scores from human annotations. In this paper, we analyze the doubly robust estimator, a widely used method in missing data analysis and causal inference, applied to this setting and establish novel theoretical properties under cluster dependence. We further illustrate our findings through simulations and a real-world conversation quality dataset. Our theoretical and empirical results underscore the importance of incorporating cluster dependence in missing response problems to perform valid statistical inference.
Path-specific effects for pulse-oximetry guided decisions in critical care
Identifying and measuring biases associated with sensitive attributes is a crucial consideration in healthcare to prevent treatment disparities. One prominent issue is inaccurate pulse oximeter readings, which tend to overestimate oxygen saturation for dark-skinned patients and misrepresent supplemental oxygen needs. Most existing research has revealed statistical disparities linking device measurement errors to patient outcomes in intensive care units (ICUs) without causal formalization. This study causally investigates how racial discrepancies in oximetry measurements affect invasive ventilation in ICU settings. We employ a causal inference-based approach using path-specific effects to isolate the impact of bias by race on clinical decision-making.
Density Ratio-Free Doubly Robust Proxy Causal Learning
We study the problem of causal function estimation in the Proxy Causal Learning (PCL) framework, where confounders are not observed but proxies for the confounders are available. Two main approaches have been proposed: outcome bridge-based and treatment bridge-based methods. In this work, we propose two kernel-based doubly robust estimators that combine the strengths of both approaches, and naturally handle continuous and high-dimensional variables. Our identification strategy builds on a recent density ratio-free method for treatment bridge-based PCL; furthermore, in contrast to previous approaches, it does not require indicator functions or kernel smoothing over the treatment variable. These properties make it especially well-suited for continuous or high-dimensional treatments. By using kernel mean embeddings, we propose the first density-ratio free doubly robust estimators for proxy causal learning, which have closed form solutions and strong uniform consistency guarantees. Our estimators outperform existing methods on PCL benchmarks, including a prior doubly robust method that requires both kernel smoothing and density ratio estimation.
Digital Twins as Synthetic Controls in Single-Arm Trials
Bertolini, Daniele, Fuller, Franklin, Smith, Aaron M., Walsh, Jonathan R., Zhuang, Run
Single-arm trials are an important study design for evaluating drug efficacy and safety without enrolling patients into a control arm. Although they do not provide the gold-standard evidence of randomized controlled trials, they are increasingly used in clinical development as they offer an efficient, ethical, and practical alternative. A wide variety of approaches can be used to construct control comparators and estimate treatment effects, from fixed comparators informed by clinical knowledge to data-based and model-based patient-level comparators, also known as synthetic controls. Powerful and flexible machine learning models can allow outcome-model-based synthetic controls to overcome key limitations of direct data-based approaches, yield more robust estimates of treatment effects, and provide a principled way to incorporate corrections or encode additional assumptions when external data are not directly comparable. In this work, we argue that outcome-model-based synthetic control arms are an important tool for single-arm trials. We focus on digital twins, personalized predictions of disease progression generated from machine learning models trained on historical datasets, which naturally leverage these flexible approaches. We review doubly robust estimators, present power and sample size formulas, and discuss trade-offs in selecting historical data for training and analysis. We also outline practical considerations for deploying digital twins within the framework of recent FDA draft guidance on the use of artificial intelligence in drug development. Finally, we reanalyze data from trials in amyotrophic lateral sclerosis and Huntington's disease to demonstrate the proposed methods.
Measuring Differences between Conditional Distributions using Kernel Embeddings
Moskvichev, Peter, Chau, Siu Lun, Sejdinovic, Dino
Comparing conditional distributions is a fundamental challenge in statistics and machine learning, with applications across a wide range of domains. While proposed methods for measuring discrepancies using kernel embeddings of distributions in a reproducing kernel Hilbert space (RKHS) provide powerful non-parametric techniques, the existing literature remains fragmented and lacks a unified theoretical treatment. This paper addresses this gap by establishing a coherent framework for studying kernel-based methods to measure divergence between conditional distributions through what we refer to as conditional maximum mean discrepancy (CMMD). The CMMD consists of a family of metrics which we call levels, with three special cases each using a different type of RKHS embedding: CMMD$_0$ (conditional mean operators), CMMD$_1$ (conditional mean embeddings), and CMMD$_2$ (joint mean embeddings). We additionally introduce a general level $s$ CMMD, clarifying the required assumptions, and establishing mathematical connections between the levels through the lens of operator-based smoothing. In addition to reviewing previously proposed estimators, we introduce a novel doubly robust estimator for the CMMD that maintains consistency provided at least one of the underlying models is correctly specified. We provide numerical experiments demonstrating that the CMMD effectively captures complex conditional dependencies for statistical testing.
Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies
Offline reinforcement learning, wherein one uses off-policy data logged by a fixed behavior policy to evaluate and learn new policies, is crucial in applications where experimentation is limited such as medicine. We study the estimation of policy value and gradient of a deterministic policy from off-policy data when actions are continuous. Targeting deterministic policies, for which action is a deterministic function of state, is crucial since optimal policies are always deterministic (up to ties). In this setting, standard importance sampling and doubly robust estimators for policy value and gradient fail because the density ratio does not exist. To circumvent this issue, we propose several new doubly robust estimators based on different kernelization approaches. We analyze the asymptotic mean-squared error of each of these under mild rate conditions for nuisance estimators. Specifically, we demonstrate how to obtain a rate that is independent of the horizon length.