Review for NeurIPS paper: Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory
–Neural Information Processing Systems
Additional Feedback: In the definition of the continuity equation, what does "div" stand for? And how it is defined? The definition of Q-hat in (3.1) implies that the activation function sigma is only applied in the first layer of the network. How much harder would the problem be to analyze if the second layer also applied an activation function? I guess dimensions D and d should be closely related, e.g.
mean-field theory, neurips paper, temporal-difference and q-learning learn representation, (3 more...)
Neural Information Processing Systems
Feb-7-2025, 11:04:30 GMT
- Technology: