Review for NeurIPS paper: Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory

Feb-7-2025, 11:04:23 GMT–Neural Information Processing Systems

The paper presents some new results regarding the convergence of TD and Q-learning when the action-value function is represented by overparameterized neural networks. The theoretical contribution made by this paper is seen as solid. The weakness described by the reviewers are not major and can be addressed in a minor revision and I therefore recommend accepting this paper.

mean-field theory, neurips paper, temporal-difference and q-learning learn representation

Neural Information Processing Systems

Feb-7-2025, 11:04:23 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)