Review for NeurIPS paper: On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces

Neural Information Processing Systems 

Summary and Contributions: This paper studies the exploration problem in episodic reinforcement learning with kernel and neural network function approximations. The proposed algorithm is an optimistic version of least-squares value iteration, where the solution to the standard LSVI is further added by a bonus function for exploration. Under assumptions on the underlying RKHS or NTK function classes, the proposed algorithms are shown to achieve a H 2 \sqrt{T} \delta_F regret, where \delta_F depends on the effective dimension of the RKHS or NTK. First, state clearly in the introduction (maybe also abstract) that this paper makes the assumption that the transition model is characterized by the RKHS class -- I think you already did but it doesn't hurt to emphasize it. Also, revise the sentence "propose the first provable efficient RL algorithm [...] without any additional assumptions on the sampling model" (lines 44-46), e.g., by changing the term "sampling model" to be "generative model" or "simulator", as such a term is ambiguous.