Review for NeurIPS paper: Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory
–Neural Information Processing Systems
The paper presents some new results regarding the convergence of TD and Q-learning when the action-value function is represented by overparameterized neural networks. The theoretical contribution made by this paper is seen as solid. The weakness described by the reviewers are not major and can be addressed in a minor revision and I therefore recommend accepting this paper.
Neural Information Processing Systems
Feb-7-2025, 11:04:23 GMT
- Technology: