Review for NeurIPS paper: Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory

Feb-7-2025, 11:04:30 GMT–Neural Information Processing Systems

Additional Feedback: In the definition of the continuity equation, what does "div" stand for? And how it is defined? The definition of Q-hat in (3.1) implies that the activation function sigma is only applied in the first layer of the network. How much harder would the problem be to analyze if the second layer also applied an activation function? I guess dimensions D and d should be closely related, e.g.

mean-field theory, neurips paper, temporal-difference and q-learning learn representation, (3 more...)

Neural Information Processing Systems

Feb-7-2025, 11:04:30 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)