Reviews: Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning
–Neural Information Processing Systems
The paper defines "Uniform-PAC" where uniformity is over the optimality criterion, eps. It is PAC like in that optimal actions are taken in all but a bounded number of steps. It is also regret like in that the algorithm is eventually good relative to any epsilon---not just one it is told to meet. I thought the discussion of different performance metrics was thorough and informative. I would have liked more intuition about the iterated logarithm idea and its main properties, but I understand that the highly technical stuff had to be expressed in very limited space.
Neural Information Processing Systems
Oct-7-2024, 14:25:48 GMT
- Technology: