Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems 

The main contributions are: (a) a new algorithm, inspired by UCRL\gamma for the discounted setting (b) a theoretical analysis showing that the new algorithm enjoys sample-complexity guarantees that scale optimally with the horizon, the accuracy and the confidence. The result is sub-optimal only with respect to the size of the state/space, where it scales potentially quadratically while known lower bounds are linear in the state-space.