Ramdas, Aaditya
Locally minimax optimal and dimension-agnostic discrete argmin inference
Kim, Ilmun, Ramdas, Aaditya
We revisit the discrete argmin inference problem in high-dimensional settings. Given $n$ observations from a $d$ dimensional vector, the goal is to test whether the $r$th component of the mean vector is the smallest among all components. We propose dimension-agnostic tests that maintain validity regardless of how $d$ scales with $n$, and regardless of arbitrary ties in the mean vector. Notably, our validity holds under mild moment conditions, requiring little more than finiteness of a second moment, and permitting possibly strong dependence between coordinates. In addition, we establish the local minimax separation rate for this problem, which adapts to the cardinality of a confusion set, and show that the proposed tests attain this rate. Our method uses the sample splitting and self-normalization approach of Kim and Ramdas (2024). Our tests can be easily inverted to yield confidence sets for the argmin index. Empirical results illustrate the strong performance of our approach in terms of type I error control and power compared to existing methods.
Online Selective Conformal Prediction: Errors and Solutions
Sale, Yusuf, Ramdas, Aaditya
In online selective conformal inference, data arrives sequentially, and prediction intervals are constructed only when an online selection rule is met. Since online selections may break the exchangeability between the selected test datum and the rest of the data, one must correct for this by suitably selecting the calibration data. In this paper, we evaluate existing calibration selection strategies and pinpoint some fundamental errors in the associated claims that guarantee selection-conditional coverage and control of the false coverage rate (FCR). To address these shortcomings, we propose novel calibration selection strategies that provably preserve the exchangeability of the calibration data and the selected test datum. Consequently, we demonstrate that online selective conformal inference with these strategies guarantees both selection-conditional coverage and FCR control. Our theoretical findings are supported by experimental evidence examining tradeoffs between valid methods.
Improving the statistical efficiency of cross-conformal prediction
Gasparin, Matteo, Ramdas, Aaditya
Conformal prediction has emerged as a general and versatile framework for constructing prediction sets in regression and classification tasks (Shafer and Vovk, 2008). Unlike traditional methods, which often depend on rigid distributional assumptions, conformal prediction transforms point predictions from any prediction (or black-box) algorithm into prediction sets that guarantee valid finite-sample marginal coverage. Originally introduced by Vovk et al. (2005), it has become increasingly influential, with numerous methods and extensions being proposed since its introduction. In particular, full conformal prediction by Vovk et al. (2005), demonstrates favorable properties regarding the coverage and the size of the prediction set. However, these advantages are counterbalanced by a substantial computational cost, which limits its practical application.
Post-detection inference for sequential changepoint localization
Saha, Aytijhya, Ramdas, Aaditya
This paper addresses a fundamental but largely unexplored challenge in sequential changepoint analysis: conducting inference following a detected change. We study the problem of localizing the changepoint using only the data observed up to a data-dependent stopping time at which a sequential detection algorithm $\mathcal A$ declares a change. We first construct confidence sets for the unknown changepoint when pre- and post-change distributions are assumed to be known. We then extend our framework to composite pre- and post-change scenarios. We impose no conditions on the observation space or on $\mathcal A$ -- we only need to be able to run $\mathcal A$ on simulated data sequences. In summary, this work offers both theoretically sound and practically effective tools for sequential changepoint localization.
Optimistic Algorithms for Adaptive Estimation of the Average Treatment Effect
Neopane, Ojash, Ramdas, Aaditya, Singh, Aarti
Estimation and inference for the Average Treatment Effect (ATE) is a cornerstone of causal inference and often serves as the foundation for developing procedures for more complicated settings. Although traditionally analyzed in a batch setting, recent advances in martingale theory have paved the way for adaptive methods that can enhance the power of downstream inference. Despite these advances, progress in understanding and developing adaptive algorithms remains in its early stages. Existing work either focus on asymptotic analyses that overlook exploration-exploitation tradeoffs relevant in finite-sample regimes or rely on simpler but suboptimal estimators. In this work, we address these limitations by studying adaptive sampling procedures that take advantage of the asymptotically optimal Augmented Inverse Probability Weighting (AIPW) estimator. Our analysis uncovers challenges obscured by asymptotic approaches and introduces a novel algorithmic design principle reminiscent of optimism in multiarmed bandits. This principled approach enables our algorithm to achieve significant theoretical and empirical gains compared to prior methods. Our findings mark a step forward in advancing adaptive causal inference methods in theory and practice.
Sharp Matrix Empirical Bernstein Inequalities
Wang, Hongjian, Ramdas, Aaditya
We present two sharp empirical Bernstein inequalities for symmetric random matrices with bounded eigenvalues. By sharp, we mean that both inequalities adapt to the unknown variance in a tight manner: the deviation captured by the first-order $1/\sqrt{n}$ term asymptotically matches the matrix Bernstein inequality exactly, including constants, the latter requiring knowledge of the variance. Our first inequality holds for the sample mean of independent matrices, and our second inequality holds for a mean estimator under martingale dependence at stopping times.
Logarithmic Neyman Regret for Adaptive Estimation of the Average Treatment Effect
Neopane, Ojash, Ramdas, Aaditya, Singh, Aarti
Estimation of the Average Treatment Effect (ATE) is a core problem in causal inference with strong connections to Off-Policy Evaluation in Reinforcement Learning. This paper considers the problem of adaptively selecting the treatment allocation probability in order to improve estimation of the ATE. The majority of prior work on adaptive ATE estimation focus on asymptotic guarantees, and in turn overlooks important practical considerations such as the difficulty of learning the optimal treatment allocation as well as hyper-parameter selection. Existing non-asymptotic methods are limited by poor empirical performance and exponential scaling of the Neyman regret with respect to problem parameters. In order to address these gaps, we propose and analyze the Clipped Second Moment Tracking (ClipSMT) algorithm, a variant of an existing algorithm with strong asymptotic optimality guarantees, and provide finite sample bounds on its Neyman regret. Our analysis shows that ClipSMT achieves exponential improvements in Neyman regret on two fronts: improving the dependence on $T$ from $O(\sqrt{T})$ to $O(\log T)$, as well as reducing the exponential dependence on problem parameters to a polynomial dependence. Finally, we conclude with simulations which show the marked improvement of ClipSMT over existing approaches.
Conformalized Interactive Imitation Learning: Handling Expert Shift and Intermittent Feedback
Zhao, Michelle, Simmons, Reid, Admoni, Henny, Ramdas, Aaditya, Bajcsy, Andrea
In interactive imitation learning (IL), uncertainty quantification offers a way for the learner (i.e. robot) to contend with distribution shifts encountered during deployment by actively seeking additional feedback from an expert (i.e. human) online. Prior works use mechanisms like ensemble disagreement or Monte Carlo dropout to quantify when black-box IL policies are uncertain; however, these approaches can lead to overconfident estimates when faced with deployment-time distribution shifts. Instead, we contend that we need uncertainty quantification algorithms that can leverage the expert human feedback received during deployment time to adapt the robot's uncertainty online. To tackle this, we draw upon online conformal prediction, a distribution-free method for constructing prediction intervals online given a stream of ground-truth labels. Human labels, however, are intermittent in the interactive IL setting. Thus, from the conformal prediction side, we introduce a novel uncertainty quantification algorithm called intermittent quantile tracking (IQT) that leverages a probabilistic model of intermittent labels, maintains asymptotic coverage guarantees, and empirically achieves desired coverage levels. From the interactive IL side, we develop ConformalDAgger, a new approach wherein the robot uses prediction intervals calibrated by IQT as a reliable measure of deployment-time uncertainty to actively query for more expert feedback. We compare ConformalDAgger to prior uncertainty-aware DAgger methods in scenarios where the distribution shift is (and isn't) present because of changes in the expert's policy. We find that in simulated and hardware deployments on a 7DOF robotic manipulator, ConformalDAgger detects high uncertainty when the expert shifts and increases the number of interventions compared to baselines, allowing the robot to more quickly learn the new behavior.
$\beta$-calibration of Language Model Confidence Scores for Generative QA
Manggala, Putra, Mastakouri, Atalanti, Kirschbaum, Elke, Kasiviswanathan, Shiva Prasad, Ramdas, Aaditya
To use generative question-and-answering (QA) systems for decision-making and in any critical application, these systems need to provide well-calibrated confidence scores that reflect the correctness of their answers. Existing calibration methods aim to ensure that the confidence score is on average indicative of the likelihood that the answer is correct. We argue, however, that this standard (average-case) notion of calibration is difficult to interpret for decision-making in generative QA. To address this, we generalize the standard notion of average calibration and introduce $\beta$-calibration, which ensures calibration holds across different question-and-answer groups. We then propose discretized posthoc calibration schemes for achieving $\beta$-calibration.
Sequential Kernelized Stein Discrepancy
Martinez-Taboada, Diego, Ramdas, Aaditya
We present a sequential version of the kernelized Stein discrepancy, which allows for conducting goodness-of-fit tests for unnormalized densities that are continuously monitored and adaptively stopped. That is, the sample size need not be fixed prior to data collection; the practitioner can choose whether to stop the test or continue to gather evidence at any time while controlling the false discovery rate. In stark contrast to related literature, we do not impose uniform boundedness on the Stein kernel. Instead, we exploit the potential boundedness of the Stein kernel at arbitrary point evaluations to define test martingales, that give way to the subsequent novel sequential tests. We prove the validity of the test, as well as an asymptotic lower bound for the logarithmic growth of the wealth process under the alternative. We further illustrate the empirical performance of the test with a variety of distributions, including restricted Boltzmann machines.