Uniform Last-Iterate Guarantee for Bandits and Reinforcement Learning

Neural Information Processing Systems 

This paper introduces a stronger metric, uniform last-iterate (ULI) guarantee, capturing both cumulative and instantaneous performance of RL algorithms.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found