Goto

Collaborating Authors

 lfdr


Calibration without labels in multiple testing

arXiv.org Machine Learning

Large-scale hypothesis testing supports probability claims about individual hypotheses, as in empirical Bayes methods for estimating local false discovery rates. We study how such claims can be interpreted as approximately calibrated forecasts of the null hypothesis, yielding interpretable error probabilities even under model misspecification. Our approach draws conceptual inspiration from probabilistic forecasting but addresses a different challenge: unlike forecasting, where labels are eventually observed, in multiple testing the ground truth is never revealed, so calibration must be assessed stochastically and established indirectly. We address this challenge by constructing a set of pseudo-labels, derived from the spacings of ordered $p$-values, which have the local false discovery rate as their regression target. Our construction unlocks existing tools for assessing and performing post-hoc calibration in multiple testing. Notably, we find on a large-scale empirical survey of published psychology and neuroscience literature that the $q$-value, a popular error measure based on the false discovery rate, can be severely miscalibrated.


Finite Resources False Discovery Rate Control in Structured Hypothesis Spaces

arXiv.org Machine Learning

Scientific discovery relies on large-scale hypothesis testing. However, the capacity to identify true discoveries while controlling false discovery faces major challenges: obtaining relevant reference data (the null distribution) is resource-intensive, leaving finite-data uncertainty, and the procedure should account for the inherent structure in the hypothesis space, when such structure exists. Here, we present a framework for controlling the false discovery rate both when each hypothesis is evidenced only by a finite count of null draws, leaving its p-value uncertain, and when the hypothesis space carries arbitrary structure, requiring only that the structure be represented through a suitable reproducing kernel. We present two decision rules that are both robust to structural mis-specification, yet offer a distinct trade-off between exact FDR control and statistical power. The first rule guarantees exact FDR control; the second maximizes power by adapting mirror-statistic control into count space, utilizing an analytical framework to assess FDR control when exact mirror symmetry is relaxed. Furthermore, the tractability gained by the RKHS framework allows us to directly investigate finite-data uncertainties, which we leverage to suggest a policy for the efficient allocation of null distribution samples.


Online selective conformal inference: adaptive scores, convergence rate and optimality

arXiv.org Machine Learning

In a supervised online setting, quantifying uncertainty has been proposed in the seminal work of \cite{gibbs2021adaptive}. For any given point-prediction algorithm, their method (ACI) produces a conformal prediction set with an average missed coverage getting close to a pre-specified level $α$ for a long time horizon. We introduce an extended version of this algorithm, called OnlineSCI, allowing the user to additionally select times where such an inference should be made. OnlineSCI encompasses several prominent online selective tasks, such as building prediction intervals for extreme outcomes, classification with abstention, and online testing. While OnlineSCI controls the average missed coverage on the selected in an adversarial setting, our theoretical results also show that it controls the instantaneous error rate (IER) at the selected times, up to a non-asymptotical remainder term. Importantly, our theory covers the case where OnlineSCI updates the point-prediction algorithm at each time step, a property which we refer to as {\it adaptive} capability. We show that the adaptive versions of OnlineSCI can convergence to an optimal solution and provide an explicit convergence rate in each of the aforementioned application cases, under specific mild conditions. Finally, the favorable behavior of OnlineSCI in practice is illustrated by numerical experiments.


Ranking by Lifts: A Cost-Benefit Approach to Large-Scale A/B Tests

arXiv.org Machine Learning

A/B testers conducting large-scale tests prioritize lifts and want to be able to control false rejections of the null. This work develops a decision-theoretic framework for maximizing profits subject to false discovery rate (FDR) control. We build an empirical Bayes solution for the problem via the greedy knapsack approach. We derive an oracle rule based on ranking the ratio of expected lifts and the cost of wrong rejections using the local false discovery rate (lfdr) statistic. Our oracle decision rule is valid and optimal for large-scale tests. Further, we establish asymptotic validity for the data-driven procedure and demonstrate finite-sample validity in experimental studies. We also demonstrate the merit of the proposed method over other FDR control methods. Finally, we discuss an application to actual Optimizely experiments.