Last Iterate Analyses of FTRL in Stochasitc Bandits

Oct-28-2025–arXiv.org Artificial Intelligence

The convergence analysis of online learning algorithms is central to machine learning theory, where last-iterate convergence is particularly important, as it captures the learner's actual decisions and describes the evolution of the learning process over time. However, in multi-armed bandits, most existing algorithmic analyses mainly focus on the order of regret, while the last-iterate (simple regret) convergence rate remains less explored -- especially for the widely studied Follow-the-Regularized-Leader (FTRL) algorithms. Recently, a growing line of work has established the Best-of-Both-Worlds (BOBW) property of FTRL algorithms in bandit problems, showing in particular that they achieve logarithmic regret in stochastic bandits. Nevertheless, their last-iterate convergence rate has not yet been studied. Intuitively, logarithmic regret should correspond to a $t^{-1}$ last-iterate convergence rate. This paper partially confirms this intuition through theoretical analysis, showing that the Bregman divergence, defined by the regular function $Ψ(p)=-4\sum_{i=1}^{d}\sqrt{p_i}$ associated with the BOBW FTRL algorithm $1/2$-Tsallis-INF (arXiv:1807.07623), between the point mass on the optimal arm and the probability distribution over the arm set obtained at iteration $t$, decays at a rate of $t^{-1/2}$.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

Oct-28-2025

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom (0.28)

Genre:
- Research Report (0.64)

Industry:
- Education (0.34)

Technology:
- Information Technology
  - Data Science > Data Mining
    - Big Data (0.88)
  - Artificial Intelligence > Machine Learning
    - Computational Learning Theory (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found