Reviews: Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning

Oct-7-2024, 14:25:48 GMT–Neural Information Processing Systems

The paper defines "Uniform-PAC" where uniformity is over the optimality criterion, eps. It is PAC like in that optimal actions are taken in all but a bounded number of steps. It is also regret like in that the algorithm is eventually good relative to any epsilon---not just one it is told to meet. I thought the discussion of different performance metrics was thorough and informative. I would have liked more intuition about the iterated logarithm idea and its main properties, but I understand that the highly technical stuff had to be expressed in very limited space.

algorithm, episodic reinforcement learning, unifying pac and regret, (8 more...)

Neural Information Processing Systems

Oct-7-2024, 14:25:48 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)