Improved Regret Analysis for Variance-Adaptive Linear Bandits and Horizon-Free Linear Mixture MDPs

Kim, Yeoneung, Yang, Insoon, Jun, Kwang-Sung

Nov-5-2021–arXiv.org Machine Learning

In online learning problems, exploiting low variance plays an important role in obtaining tight performance guarantees yet is challenging because variances are often not known a priori. Recently, a considerable progress has been made by Zhang et al. (2021) where they obtain a variance-adaptive regret bound for linear bandits without knowledge of the variances and a horizon-free regret bound for linear mixture Markov decision processes (MDPs). In this paper, we present novel analyses that improve their regret bounds significantly. For linear bandits, we achieve $\tilde O(d^{1.5}\sqrt{\sum_{k}^K \sigma_k^2} + d^2)$ where $d$ is the dimension of the features, $K$ is the time horizon, and $\sigma_k^2$ is the noise variance at time step $k$, and $\tilde O$ ignores polylogarithmic dependence, which is a factor of $d^3$ improvement. For linear mixture MDPs, we achieve a horizon-free regret bound of $\tilde O(d^{1.5}\sqrt{K} + d^3)$ where $d$ is the number of base models and $K$ is the number of episodes. This is a factor of $d^3$ improvement in the leading term and $d^6$ in the lower order term. Our analysis critically relies on a novel elliptical potential `count' lemma. This lemma allows a peeling-based regret analysis, which can be of independent interest.

improved regret analysis, lemma 5, variance-adaptive linear bandit, (8 more...)

arXiv.org Machine Learning

Nov-5-2021

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Arizona (0.04)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)
- Asia
  - Middle East > Jordan (0.04)
  - South Korea > Seoul
    - Seoul (0.04)

Genre:
- Research Report (0.40)

Industry:
- Education > Educational Setting > Online (0.34)

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning (1.00)
  - Enterprise Applications > Human Resources
    - Learning Management (0.34)
  - Data Science > Data Mining
    - Big Data (0.46)