AITopics

Temporal-difference and Q-learning play a key role in deep reinforcement learning, where they are empowered by expressive nonlinear function approximators such as neural networks. At the core of their empirical successes is the learned feature representation, which embeds rich observations, e.g., images and texts, into the latent space that encodes semantic structures. Meanwhile, the evolution of such a feature representation is crucial to the convergence of temporal-difference and Q-learning. In particular, temporal-difference learning converges when the function approximator is linear in a feature representation, which is fixed throughout learning, and possibly diverges otherwise. We aim to answer the following questions: When the function approximator is a neural network, how does the associated feature representation evolve?

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Country: North America > United States (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

analysis of TD [21] requires an implicit local linearization with respect to the initial feature representation, which

Neural Information Processing SystemsMar-21-2025, 10:04:42 GMT

We appreciate the valuable comments from the reviewers. We study the discretization of the trajectory of PDE in Proposition 3.1 and Appendix D, based on which we establish a discrete-time convergence rate in Corollary 4.4 by aggregating the the We will cite the paper in our revision. Thank you for pointing out. On the other hand, we do understand that Assumptions B.1 Thus, we put Q-learning in the appendix as an extension of our main results for TD. It is worth noting that UAT requires additional conditions on the target function, e.g., As UAT doesn't ensure the approximation of any In contrast, we show in Lemma C.1 that, The proof is technical and requires certain preliminary knowledge on optimal transport, such as the Wasserstein gradient flow. We will include the following flowchart of the proof in the revision.

artificial intelligence, feature representation, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.83)

Add feedback

A Appendix

Neural Information Processing SystemsMar-21-2025, 10:04:38 GMT

G. From Eq. (4), we have: ϕ The proof is inspired by universality proofs of prior symmetrization approaches [102, 74, 41]. Let ψ: X Y be an arbitrary G equivariant function. We leave proving this as a future work. In general, we are interested in obtaining a faithful representation ρ, i.e., such that ρ(g) is distinct for each g. We now show the following: Proposition 3. The proposed distribution p We now show the following: Proposition 4. The proposed distribution p We also note that scale(Q) gives orthogonal matrix of determinant +1, as it returns Q if det(Q) = +1, otherwise (det(Q) = 1 since Q is orthogonal) scales the first column by 1 which flips determinant to +1 while not affecting orthogonality.

artificial intelligence, equivariance, machine learning, (19 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Filters

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

ba3c736667394d5082f86f28aef38107-Supplemental.pdf

ba3c736667394d5082f86f28aef38107-Paper.pdf

ba3c5fe1d6d6708b5bffaeb6942b7e04-Paper.pdf

b9f35816f460ab999cbc168c4da26ff3-Supplemental.pdf

b9f35816f460ab999cbc168c4da26ff3-Paper.pdf

b9ed18a301c9f3d183938c451fa183df-Supplemental.pdf

Reinforcement Learning in Newcomblike Problems

Qi Cai Northwestern University Northwestern University Evanston, IL60208

analysis of TD [21] requires an implicit local linearization with respect to the initial feature representation, which

A Appendix