dependence
Nearly Dimension-Independent Convergence of Mean-Field Black-Box Variational Inference
We prove that, given a mean-field location-scale variational family, black-box variational inference (BBVI) with the reparametrization gradient converges at a rate that is nearly independent of any explicit dimension dependence. Specifically, for a d-dimensional strongly log-concave and log-smooth target, the number of iterations for BBVI with a sub-Gaussian family to obtain a solution ฯต-close to the global optimum has an explicit dimension dependence no larger than O(logd). This is a significant improvement over the O(d)dependence of full-rank locationscale families. For heavy-tailed families, we prove a weaker O(d2/k)dependence, where kis the number of finite moments of the family. Additionally, if the Hessian of the target log-density is constant, the complexity is free of any explicit dimension dependence. We also prove that our bound on the gradient variance, which is key to our result, cannot be improved using only spectral bounds on the Hessian of the target log-density.
Constrained Best Arm Identification
In real-world decision-making problems, one needs to pick among multiple policies the one that performs best while respecting economic constraints. This motivates the problem of constrained best-arm identification for bandit problems where every arm is a joint distribution of reward and cost. We investigate the general case where reward and cost are dependent. The goal is to accurately identify the arm with the highest mean reward among all arms whose mean cost is below a given threshold. We prove information-theoretic lower bounds on the sample complexity for three models: Gaussian with fixed covariance, Gaussian with unknown covariance, and non-parametric distributions of rectangular support. We propose a combination of a sampling and a stopping rule that correctly identifies the constrained best arm and matches the optimal sample complexities for each of the three models. Simulations demonstrate the performance of our algorithms.
Agents Robust to Distribution Shifts Learn Causal World Models Even Under Mediation
In this work, we prove that agents capable of adapting to distribution shifts must have learned the causal model of their environment even in the presence of mediation. This term describes situations where an agent's actions affect its environment, a dynamic common to most real-world settings. For example, a robot in an industrial plant might interact with tools, move through space, and transform products to complete its task. We introduce an algorithm for eliciting causal knowledge from robust agents using optimal policy oracles, with the flexibility to incorporate prior causal knowledge. We further demonstrate its effectiveness in mediated singleagent scenarios and multi-agent environments. We identify conditions under which the presence of a single robust agent is sufficient to recover the full causal model and derive optimal policies for other agents in the same environment. Finally, we show how to apply these results to sequential decision-making tasks modeled as Partially Observable Markov Decision Processes (POMDPs).
Generalized Linear Bandits: Almost Optimal Regret with One-Pass Update
We study the generalized linear bandit (GLB) problem, a contextual multi-armed bandit framework that extends the classical linear model by incorporating a nonlinear link function, thereby modeling a broad class of reward distributions such as Bernoulli and Poisson. While GLBs are widely applicable to real-world scenarios, their non-linear nature introduces significant challenges in achieving both computational and statistical efficiency. Existing methods typically trade off between two objectives, either incurring high per-round costs for optimal regret guarantees or compromising statistical efficiency to enable constant-time updates. In this paper, we propose a jointly efficient algorithm that attains a nearly optimal regret bound with O(1)time and space complexities per round. The core of our method is a tight confidence set for the online mirror descent (OMD) estimator, which is derived through a novel analysis that leverages the notion of mix loss from online prediction. The analysis shows that our OMD estimator, even with its one-pass updates, achieves statistical efficiency comparable to maximum likelihood estimation, thereby leading to a jointly efficient optimistic method.
Handling Missing Responses under Cluster Dependence with Applications to Language Model Evaluation
Human annotations play a crucial role in evaluating the performance of GenAI models. Two common challenges in practice, however, are missing annotations (the response variable of interest) and cluster dependence among human-AI interactions (e.g., questions asked by the same user may be highly correlated). Reliable inference must address both issues to achieve unbiased estimation and appropriately quantify uncertainty when estimating average scores from human annotations. In this paper, we analyze the doubly robust estimator, a widely used method in missing data analysis and causal inference, applied to this setting and establish novel theoretical properties under cluster dependence. We further illustrate our findings through simulations and a real-world conversation quality dataset. Our theoretical and empirical results underscore the importance of incorporating cluster dependence in missing response problems to perform valid statistical inference.
Replicable Online pricing
We explore the concept of replicability, which ensures algorithmic consistency despite input data variations, for online pricing problems, specifically prophet inequalities and delegation. Given the crucial role of replicability in enhancing transparency in economic decision-making, we present a replicable and nearly optimal pricing strategy for prophet inequalities, achieving a sample complexity of poly(log |X|), where X is the ground set of distributions. Furthermore, we extend these findings to the delegation problem and establish lower bound that proves the necessity of the log |X| dependence. En route to obtaining these results, we develop a number of technical contributions which are of independent interest. Most notably, we propose a new algorithm for a variant of the heavy hitter problem, which has a nearly linear dependence on the inverse of the heavy hitter parameter, significantly improving upon existing results which have a cubic dependence.
Odor presentation ABCDEOdOdT1T2T5T4T1T1T1T1T134567orors
Understanding the evolving dependence between two sets of multivariate signals is fundamental in neuroscience and other domains where sub-networks in a system interact dynamically over time. Despite the growing interest in multivariate time series analysis, existing methods for between-clusters dependence typically rely on the assumption of stationarity and lack the temporal resolution to capture transient, frequency-specific interactions. To overcome this limitation, we propose scale-specific wavelet canonical coherence (WaveCanCoh), a novel framework that extends canonical coherence analysis to the nonstationary setting by leveraging the multivariate locally stationary wavelet model. The proposed WaveCanCoh enables the estimation of time-varying canonical coherence between clusters, providing interpretable insight into scale-specific time-varying interactions between clusters. Through extensive simulation studies, we demonstrate that WaveCanCoh accurately recovers true coherence structures under both locally stationary and general nonstationary conditions. Application to local field potential (LFP) activity data recorded from the hippocampus reveals distinct dynamic coherence patterns between correct and incorrect memory-guided decisions, illustrating the capacity of the method to detect behaviorally relevant neural coordination.
Breaking the Order Barrier: Off-Policy Evaluation for Confounded POMDPs
We consider off-policy evaluation (OPE) in Partially Observable Markov Decision Processes (POMDPs) with unobserved confounding. Recent advances have introduced bridge-function to circumvent unmeasured confounding and develop estimators for the policy value, yet the statistical error bounds of them related to the length of horizon T and the size of the state-action space |O||A| remain largely unexplored. In this paper, we systematically investigate the finite-sample error bounds of OPE estimators in finite-horizon tabular confounded POMDPs. Specifically, we show that under certain rank conditions, the estimation error for policy value can achieve a rate of O(T1.5/ n), excluding the cardinality of the observation space |O| and the action space |A|. With an additional mild condition on the concentrability coefficients in confounded POMDPs, the rate of estimation error can be improved to O(T/ n).
Sketched Adaptive Distributed Deep Learning: ASharp Convergence Analysis
Combining gradient compression with adaptive optimizers is a highly desirable goal in distributed learning, with potential benefits in both fewer communication rounds and less per-round communication. In spite of preliminary empirical promise, certain major challenges in the convergence analysis of such methods have stayed open: handling compression based approximation of both first and second moments (pre-conditioner) which appear as a ratio; avoiding dependence on the number of parameters, which is extremely large in modern deep models; and providing high-probability guarantees instead of in-expectation, which can hide high variance behavior. In this work, we introduce a family of Sketched Adaptive Distributed Learning (SADL) algorithms which can use suitable unbiased gradient sketching for compression with suitable adaptive optimization algorithms. As our main contribution, we provide theoretical convergence guarantees of SADL algorithms which addresses all of the existing challenges. In particular, our guarantees hold with high probability, picks up only a logarithmic dependence on the number of parameters, and the first and second moment approximation is handled precisely yielding a dependence on the intrinsic dimension of the loss Hessian, which is significantly smaller than the full dimensionality of deep learning models. Empirically, the SADL algorithms are shown to be competitive with and often outperform baselines on both vision and language tasks, in both supervised fine-tuning and training-from-scratch regimes. Further, the SADL algorithms are also competitive with the state-of-the-art communication-efficient distributed learning algorithms based on error feedback.