inequality
Optimal Gap-Dependent Regret for Private Stochastic Decision-Theoretic Online Learning
Cesari, Tommaso, Colomboni, Roberto
We study stochastic decision-theoretic online learning with full information and event-level pure differential privacy. A COLT open problem of Hu and Mehta asks to determine the optimal gap-dependent regret rate for stochastic decision-theoretic online learning under pure event-level differential privacy. For $K$ actions, losses in $[0,1]$, and a unique best action separated from the second-best action by gap $Δ_{\min}$, the known lower bound is of order $ \frac{\log K}{\min\{Δ_{\min},\varepsilon\}}, $ or equivalently, up to universal constants, of order \[ \frac{\log K}{Δ_{\min}}+\frac{\log K}{\varepsilon}. \] We give a horizon-free pure-DP algorithm and prove the explicit regret bound \[ \operatorname{Reg}_T \le 1000 \cdot \left(\frac{\log K}{Δ_{\min}}+\frac{\log K}{\varepsilon}\right) \] for every horizon $T$. The numerical constant is not optimized. The algorithm partitions time into blocks of exponentially increasing size, plays a single action throughout each block, and chooses the next action by an exponential mechanism applied to a data-independent random prefix of the previous block. The random prefix converts block regret into a sum, over all prefix lengths, of softmax selection errors. A single entropy-potential argument controls all privacy-dominated large-gap actions at cost $\log K/\varepsilon$.
Low Rank for Rank: Uncertainty-Aware Task-Specific LLM Ranking under Sparse Pairwise Comparisons
Li, Jiachun, Simchi-Levi, David, Sun, Will Wei
Pairwise human-preference platforms such as Chatbot Arena have become central to large language model (LLM) evaluation, yet reliable task-specific ranking remains challenging. Global leaderboards mask task heterogeneity, while ranking each fine-grained task independently is unstable under sparse, imbalanced comparisons. We propose a low-rank framework for task-specific LLM ranking from sparse pairwise comparisons, modeling the task-by-model ability matrix $Θ^\star \in \mathbb{R}^{d_t \times d_m}$ as low rank so that information is shared across related tasks while task-specific differences are preserved. We first develop a max-norm ($\ell_\infty$) accurate estimator for the latent scores, combining a convex initializer with alternating-minimization refinement, and prove task-wise top-$K$ recovery guarantees under sparse sampling. Our main contribution is an uncertainty quantification framework for task-specific ranking. We construct cross-fitted one-step debiased estimators for fixed score contrasts -- such as the task-specific ability gap between two models -- yielding asymptotically valid confidence intervals that attain the semiparametric efficiency bound. We then extend the inference to the high-dimensional ranking regime, where per-task ranks and top-$K$ membership are determined by many dependent score-gap hypotheses. Using Gaussian and multiplier-bootstrap calibration, we obtain simultaneous confidence sets for per-task ranks and valid top-$K$ membership tests across many tasks and models. Experiments on synthetic data and Chatbot Arena show that low-rank sharing improves sample efficiency over independent task-wise Bradley-Terry estimation and produces tighter, better-calibrated ranking certificates, with the largest gains in the sparse regime typical of real LLM benchmarks.
Wasserstein Contraction of Coordinate Ascent Variational Inference
Caprio, Rocco, Corenflos, Adrien, Power, Sam
Finding approximations to an intractable probability distribution π of interest (usually known only up to a normalizing constant) is a key problem in scientific computing. Variational Inference stands out as a particularly attractive tool for this task, owing to its statistical and computational efficiency, and it has been the framework underlying many advances in computational statistics over the past half century (Parisi, 1980; Hinton and Van Camp, 1993; Jordan et al., 1999; Bishop and Nasrabadi, 2006). The central idea is to seek a tractable approximation to π within a chosen family of tractable distributions Q by minimizing a divergence to π over that'variational' family. Often, it is convenient or well-motivated to work with the family of product (or tensor, or factorized) distributions Q = P m, and define optimality through minimisation of the Kullback-Leibler (KL) divergence (also'relative entropy') min KL(ϱ||π): ϱ P m . A key practical aspect of working with this particular loss function is that in solving the associated optimisation problem, one is only required to compute expectations under the tractable variational distribution ϱ, rather than under the intractable target distribution π. In Bayesian statistics, π typically represents the joint posterior distribution of latent variables z Z and some parameters β B given observed data y Y. In these cases, we often choose m = 2 and seek the best variational approximation µ(dz) ν(dβ) to π to solve min KL(µ ν||π): µ P(Z), ν P(B) . The coordinate ascent variational inference algorithm (CAVI, Bishop and Nasrabadi, 2006; Blei et al., 2017) solves this problem by iteratively minimizing the Kullback-Leibler divergence with respect to one element at a time: given a starting point ν0, it iterates µk:= argmin
Improved Guarantees for Heterogeneous Treatment-Effect Estimation via Matrix Completion
Mehrotra, Anay, Tran, Phuc, Vu, Van H., Zampetakis, Manolis
A central goal of modern causal inference is estimating heterogeneous treatment effects to answer questions like "how does an intervention affect each unit," rather than only on average. We study this problem with panel-data where we observe $n$ units across $m$ times under unknown, non-uniform treatment assignments. The data in this setting is naturally represented as a matrix of all unit--time treatment effects. Estimating heterogeneous treatment effects can then be expressed as obtaining a good estimation of each row's average in this matrix. This allows us to formulate the problem as matrix completion, which can be solved under natural low-rankness assumptions. However, existing matrix-completion guarantees are not powerful enough to get meaningful bounds for the per-row guarantee required for estimating the heterogeneous treatment effect; roughly speaking, they are only useful for estimating average treatment effect bounds, as also illustrated in a recent line of work. We give a simple, computationally efficient estimator that, without knowledge of the propensities and under standard low-rankness and regularity assumptions, achieves a row-wise $\ell_2$ error of $\tilde{O}(\sqrt{\frac{1}{n} + \frac{n}{m^2}})$. Technically, our analysis establishes the first sharp row-wise $\ell_2$-perturbation bound for low-rank approximation, complementing existing spectral-, Frobenius-, and entrywise perturbation theory.
Learning to target with network interference
Wang, Xiaomeng, Bastani, Hamsa, Bastani, Osbert, Ren, Zhimei
This paper studies adaptive targeting under network interference in a bandit setting, where treatments applied to one individual may affect others through spillover effects. We consider a linear model in a sparse regime, where each individual's outcome can be affected by at most a few others. We first establish a regret lower bound showing that ignoring the network structure and reducing the problem to a standard linear bandit inevitably leads to inefficient learning, particularly in large populations. To understand how structural information can be leveraged, we analyze regimes with varying levels of knowledge of the interference structure: (1) full support knowledge, (2) knowledge of the column support sizes, and (3) no prior knowledge. For each regime, we establish regret lower bounds characterizing the fundamental limits of learning, and develop algorithms that achieve near-optimal regret. Together, our results provide a unified view of how knowledge of the interference structure governs the efficiency of online learning under interference, and offer practical adaptive targeting algorithms in each setting. Numerical experiments on synthetic and real-world data demonstrate the practical benefits of our algorithms.
Mind the Gap: Mixtures of Gaussians in Approximate Differential Privacy
Liu, Huikang, Selvi, Aras, Wiesemann, Wolfram
We design a class of additive noise mechanisms that satisfy \((\varepsilon, δ)\)-differential privacy (DP) for scalar, real-valued query functions with known sensitivities, with a particular focus on moderate and low-privacy regimes. These mechanisms, which we call \textit{mixture mechanisms}, are constructed by mixing multiple Gaussian distributions that share the same variance but differ in their means and mixture weights. The resulting distributions can be interpreted as convex combinations of a zero-mean Gaussian (as used in the analytic Gaussian mechanism) and additional Gaussians whose means depend on the sensitivity of the query function. We derive tight conditions on the variances required for \((\varepsilon, δ)\)-DP and provide efficient algorithms to compute them. Compared to the analytic Gaussian mechanism, our mechanisms yield substantially lower expected noise amplitudes (\(l_1\)-loss) and variances (\(l_2\)-loss for zero-mean distributions). In the low-privacy regime that motivates our design, our mechanisms approach optimality, mitigating nearly all of the optimality gap of the analytic Gaussian mechanism.
Learning to Bid in Repeated Second-Price Auctions with Dynamic Values and Aggregated Feedback
Heymann, Benjamin, Sakhi, Otmane
We study the problem of learning to bid when the bidder's value is dynamic, i.e., when the current value depends on past outcomes. Specifically, we consider a bidder participating in repeated second-price auctions whose value depends on the time elapsed since their last successful bid, with auctions arriving in continuous time and only aggregated feedback revealed at the end of the horizon. Such a bidder must (1) balance the immediate benefit of winning the current auction against its impact on future values and (2) learn unknown environmental parameters. We derive regret bounds for a class of learning methods that combine plug-in estimators with a differential-equation characterization of the optimal policy, and show that a specific confidence bound algorithm learns the optimal policy with a near optimal regret of $\widetilde{O}(\log N)$ for piecewise linear primitives, and $\widetilde{O}(N^{1/3})$ for general, smooth primitives, achieving these regrets without explicit randomization. These theoretical results are supported by numerical experiments.
Variance-Adaptive Optimal Algorithm for Reinforcement Learning with Multinomial Logit Function Approximation
Kim, Wonyoung, Oh, Min-Hwan, Iyengar, Garud, Zeevi, Assaf
Reinforcement learning with multinomial logistic (MNL) function approximation has become an important framework due to its flexibility and broad applicability. While existing studies have established regret guarantees under worst-case analysis, they do not capture how performance depends on the variability of the interaction between the learner and the environment. In this paper, we develop a new theoretical analysis for MNL-based Markov decision processes that yields explicit variance-adaptive regret bounds. Our algorithm is computationally efficient and achieves the instance-wise optimal rate of regret, narrowing the gap between upper and lower bounds. Our numerical experiments validate that our method learns optimal policies more efficiently than conventional approaches.
Conservative neural posterior estimation via distributionally robust training
Laplante, William, Hikida, Yuga, Dellaporta, Charita, Briol, François-Xavier, Bharti, Ayush
Simulation-based inference (SBI; Cranmer et al., 2020) is a powerful framework for inferring parameters of scientific models whose likelihood functions are unavailable or computationally prohibitive to evaluate, but for which simulating data is straightforward. The use of flexible neural conditional density estimators has substantially expanded the applicability of SBI to challenging problems, especially in fields such as particle physics (Brehmer, 2021), cognitive neuroscience (Fengler et al., 2021), economics (Dyer et al., 2024) and cosmology (Alsing et al., 2018; Jeffrey et al., 2021). Neural SBI methods rely on simulations from the scientific model to approximate intractable quantities such as the posterior, the likelihood, the likelihood-to-evidence ratio, or the score function; see Zammit-Mangion et al. (2024) for a recent review. In this work, we focus on the widely used neural posterior estimation (NPE) method (Papamakarios and Murray, 2016; Radev et al., 2022). A central practical limitation of NPE is the simulation budget required to train the conditional density estimator. As many scientific simulators are expensive to run, generating a sufficiently large training set is often the main computational bottleneck.
Burnham accuses Blair of ignoring inequality as he hits back at ex-PM
Andy Burnham has accused former Labour Prime Minister Sir Tony Blair of failing to understand what's going on in people's lives and underestimating the impact of inequality. Sir Tony used a 5,600 word essay to argue the Labour government had no coherent plan for the country and had introduced policies that had held back business. He urged Labour not to move to the left and to embrace the radical centre instead. But Burnham, who is widely expected to challenge Sir Keir Starmer for the Labour leadership if he wins a by-election next month, told the Observer Sir Tony doesn't mention inequality once in his critique of where the Labour government has gone wrong. If you don't get how that's driving politics now, if you are not rooting your analysis in the fact that people are unable to live and that things that were taken for granted are no longer affordable, then you are not understanding what's going on, said the mayor of Greater Manchester.