Review for NeurIPS paper: Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension

Neural Information Processing Systems 

This assumption is crucial for dynamic programming type of analysis. Although it seems to be just an assumption on the value function class, it actually also implicit makes assumption about the MDP. The difference between the sensitive sampling in this paper and the prior work also need to be discussed more. Moreover, the connection between Lemma 9 and 10 and Proposition 3 and Lemma 2 in [44] also need to be discussed. I think these discussions will be helpful for people to understand the details and digest the proof.