Reviews: Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs
–Neural Information Processing Systems
This paper provides with a theoretical analysis of optimistic algorithms in the setting of episodic MDPs. The authors provide with a problem-dependent and gap-dependent log(T) regret bound which is a finite-time bound, that unlike previous bounds does not take into account other problem properties such as diameter or mixing times. The paper includes several theoretical results, and introduces a novel'clipping' trick used to prove the bounds. I found the paper interesting, however, it is constructed in a way which is hard to follow and not very readable. First, many concepts are referred to before they are defined (for instance, the notion of optimistic algorithms).
Neural Information Processing Systems
Jan-21-2025, 17:38:40 GMT
- Technology: