Reviews: Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs

Jan-21-2025, 17:38:40 GMT–Neural Information Processing Systems

This paper provides with a theoretical analysis of optimistic algorithms in the setting of episodic MDPs. The authors provide with a problem-dependent and gap-dependent log(T) regret bound which is a finite-time bound, that unlike previous bounds does not take into account other problem properties such as diameter or mixing times. The paper includes several theoretical results, and introduces a novel'clipping' trick used to prove the bounds. I found the paper interesting, however, it is constructed in a way which is hard to follow and not very readable. First, many concepts are referred to before they are defined (for instance, the notion of optimistic algorithms).

intuition, non-asymptotic gap-dependent regret bound, tabular mdp, (2 more...)

Neural Information Processing Systems

Jan-21-2025, 17:38:40 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.94)