Review for NeurIPS paper: Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory

Neural Information Processing Systems 

Additional Feedback: In the definition of the continuity equation, what does "div" stand for? And how it is defined? The definition of Q-hat in (3.1) implies that the activation function sigma is only applied in the first layer of the network. How much harder would the problem be to analyze if the second layer also applied an activation function? I guess dimensions D and d should be closely related, e.g.