Improved Regret Analysis for Variance-Adaptive Linear Bandits and Horizon-Free Linear Mixture MDPs

Dec-23-2025, 17:33:33 GMT–Neural Information Processing Systems

In online learning problems, exploiting low variance plays an important role in obtaining tight performance guarantees yet is challenging because variances are often not known a priori. Recently, considerable progress has been made by Zhang et al. (2021) where they obtain a variance-adaptive regret bound for linear bandits without knowledge of the variances and a horizon-free regret bound for linear mixture Markov decision processes (MDPs). In this paper, we present novel analyses that improve their regret bounds significantly.

horizon-free linear mixture mdp, improved regret analysis, variance-adaptive linear bandit, (5 more...)

Neural Information Processing Systems

Dec-23-2025, 17:33:33 GMT

Conferences Web Page

Add feedback

Industry:
- Education (0.59)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)