analysis of TD [21] requires an implicit local linearization with respect to the initial feature representation, which

Neural Information Processing Systems 

We appreciate the valuable comments from the reviewers. We study the discretization of the trajectory of PDE in Proposition 3.1 and Appendix D, based on which we establish a discrete-time convergence rate in Corollary 4.4 by aggregating the the We will cite the paper in our revision. Thank you for pointing out. On the other hand, we do understand that Assumptions B.1 Thus, we put Q-learning in the appendix as an extension of our main results for TD. It is worth noting that UAT requires additional conditions on the target function, e.g., As UAT doesn't ensure the approximation of any In contrast, we show in Lemma C.1 that, The proof is technical and requires certain preliminary knowledge on optimal transport, such as the Wasserstein gradient flow. We will include the following flowchart of the proof in the revision.