Achieving Near-Optimal Regret for Bandit Algorithms with Uniform Last-Iterate Guarantee

Open in new window