Reviews: Regret Bounds for Learning State Representations in Reinforcement Learning

Neural Information Processing Systems 

This paper proposes a natural extension of UCRL2 to learning state representations. The proposed algorithm chooses optimistically over a finite set of candidate MDPs and their corresponding policies. The algorithm is analyzed and improves over existing regret bounds. The paper was discussed and all reviewers agree that this is a natural extension of UCRL2 that deserves to be published.