Reviews: Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies

Neural Information Processing Systems 

As such, it opens up potential new research approaches along with providing an improvement on the SOTA. Quality: The argument is well-developed, and extensive proofs are provided in the supplementary materials or referenced in existing literature. The greedy approach is directly applied to two existing SOTA full-planning-based algorithms, suggesting it is a generalizable alternative. Clarity: The paper is generally well-organized and clear; the paper gives an intuitive sense of the results, although the bulk of the proofs are confined to the supplementary material. Several scattered clarity issues are described in the detailed comments below.