Review for NeurIPS paper: On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces
–Neural Information Processing Systems
This paper studies the exploration problem in episodic reinforcement learning with kernel and neural network function approximations. The authors propose a novel algorithm which is an optimistic version of least-squares value iteration, where the solution to the standard LSVI is further added by a bonus function for exploration. They derive regret bounds for this algorithm for two different function classes: RKHS and NTK. Overall, the technical contribution in this paper seems solid. Some reviewers had some concerns about the assumptions made for the analysis, especially regarding the one assuming that the Bellman optimality update lies in the RKHS.
Neural Information Processing Systems
Jan-27-2025, 03:17:01 GMT
- Technology: