inequality hold
Graph Alignment via Birkhoff Relaxation
We consider the graph alignment problem, wherein the objective is to find a vertex correspondence between two graphs that maximizes the edge overlap. The graph alignment problem is an instance of the quadratic assignment problem (QAP), known to be NP-hard in the worst case even to approximately solve. In this paper, we analyze Birkhoff relaxation, a tight convex relaxation of QAP, and present theoretical guarantees on its performance when the inputs follow the Gaussian Wigner Model. More specifically, the weighted adjacency matrices are correlated Gaussian Orthogonal Ensemble with correlation 1/ 1+ฯ2 .
ABayesian Approach to Contextual Dynamic Pricing using the Proportional Hazards Model with Discrete Price Data
Dynamic pricing algorithms typically assume continuous price variables, which may not reflect real-world scenarios where prices are often discrete. This paper demonstrates that leveraging discrete price information within a semi-parametric model can substantially improve performance, depending on the size of the support set of the price variable relative to the time horizon. Specifically, we propose a novel semi-parametric contextual dynamic pricing algorithm, namely BayesCoxCP, based on a Bayesian approach to the Cox proportional hazards model. Our theoretical analysis establishes high-probability regret bounds that adapt to the sparsity level ฮณ, proving that our algorithm achieves a regret upper bound of eO(T(1+ฮณ)/2 + dT) for ฮณ < 1/3 and eO(T2/3 + dT) for ฮณ 1/3, where ฮณ represents the sparsity of the price grid relative to the time horizon T. Through numerical experiments, we demonstrate that our proposed algorithm significantly outperforms an existing method, particularly in scenarios with sparse discrete price points.
Variance-Adaptive Optimal Algorithm for Reinforcement Learning with Multinomial Logit Function Approximation
Kim, Wonyoung, Oh, Min-Hwan, Iyengar, Garud, Zeevi, Assaf
Reinforcement learning with multinomial logistic (MNL) function approximation has become an important framework due to its flexibility and broad applicability. While existing studies have established regret guarantees under worst-case analysis, they do not capture how performance depends on the variability of the interaction between the learner and the environment. In this paper, we develop a new theoretical analysis for MNL-based Markov decision processes that yields explicit variance-adaptive regret bounds. Our algorithm is computationally efficient and achieves the instance-wise optimal rate of regret, narrowing the gap between upper and lower bounds. Our numerical experiments validate that our method learns optimal policies more efficiently than conventional approaches.
Unified Framework of Distributional Regret in Multi-Armed Bandits and Reinforcement Learning
We study the distribution of regret in stochastic multi-armed bandits and episodic reinforcement learning through a unified framework. We formalize a distributional regret bound as a probabilistic guarantee that holds uniformly over all confidence levels $ฮด\in (0,1]$, thereby characterizing the regret distribution across the full range of $ฮด$. We present a simple UCBVI-style algorithm with exploration bonus $\min\{c_{1,k}/N, c_{2,k}/\sqrt{N}\}$, where $N$ denotes the visit count and $(c_{1,k},c_{2,k})$ are user-specified parameters. For arbitrary parameter sequences, we derive general gap-independent and gap-dependent distributional regret bounds, yielding a principled characterization of how the parameters control the trade-off between expected performance, tail risk, and instance-dependent behavior. In particular, our bounds achieve optimal trade-offs between expected and distributional regret in both minimax and instance-dependent regimes. As a special case, for multi-armed bandits with $A$ arms and horizon $T$, we obtain a distributional regret bound of order $\mathcal{O}(\sqrt{AT}\log(1/ฮด))$, confirming the conjecture of Lattimore & Szepesvรกri (2020, Section 17.1) for the first time.
The Bernstein-von Mises theorem for Bayesian one-pass online learning
Lee, Jeyong, Choi, Junhyeok, Kim, Dongguen, Chae, Minwoo
Bayesian online learning provides a coherent framework for sequential inference. However, its theoretical understanding remains limited, particularly in the one-pass setting. Existing theoretical guarantees typically require the mini-batch sample size to diverge, a condition that fails in the one-pass regime. In this paper, we propose a new Bayesian online learning algorithm tailored to the one-pass setting, which incorporates a warm-start phase to ensure stable sequential updates. For this algorithm, we show that the sequentially updated posterior attains the optimal convergence rate. Building on this, we establish an online analogue of the Bernstein-von Mises theorem, which guarantees valid uncertainty quantification without diverging mini-batch sample sizes. Our analysis is based on a novel theoretical framework that differs fundamentally from existing approaches in the online learning literature. Numerical experiments on generalized linear models show that the proposed method matches the performance of the batch estimator while outperforming existing online procedures.
Decentralized Randomly Distributed Multi-agent Multi-armed Bandit with Heterogeneous Rewards
We study a decentralized multi-agent multi-armed bandit problem in which multiple clients are connected by time dependent random graphs provided by an environment. The reward distributions of each arm vary across clients and rewards are generated independently over time by an environment based on distributions that include both sub-exponential and sub-Gaussian distributions. Each client pulls an arm and communicates with neighbors based on the graph provided by the environment. The goal is to minimize the overall regret of the entire system through collaborations. To this end, we introduce a novel algorithmic framework, which first provides robust simulation methods for generating random graphs using rapidly mixing Markov chains or the random graph model, and then combines an averaging-based consensus approach with a newly proposed weighting technique and the upper confidence bound to deliver a UCB-type solution. Our algorithms account for the randomness in the graphs, removing the conventional doubly stochasticity assumption, and only require the knowledge of the number of clients at initialization. We derive optimal instance-dependent regret upper bounds of order logT in both sub-Gaussian and sub-exponential environments, and a nearly optimal mean-gap independent regret upper bound of order T logT up to a logT factor. Importantly, our regret bounds hold with high probability and capture graph randomness, whereas prior works consider expected regret under assumptions and require more stringent reward distributions.