Reviews: Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning

Neural Information Processing Systems 

The paper defines "Uniform-PAC" where uniformity is over the optimality criterion, eps. It is PAC like in that optimal actions are taken in all but a bounded number of steps. It is also regret like in that the algorithm is eventually good relative to any epsilon---not just one it is told to meet. I thought the discussion of different performance metrics was thorough and informative. I would have liked more intuition about the iterated logarithm idea and its main properties, but I understand that the highly technical stuff had to be expressed in very limited space.