Neural Temporal-Difference Learning Converges to Global Optima

Qi Cai, Zhuoran Yang, Jason D. Lee, Zhaoran Wang

Oct-3-2025, 06:53:11 GMT–Neural Information Processing Systems

TD converges at a sublinear rate to the global optimum of the mean-squared projected Bellman error for policy evaluation.

arxiv preprint arxiv, function approximation, neural network, (12 more...)

Neural Information Processing Systems

Oct-3-2025, 06:53:11 GMT

Conferences PDF

Country:
- North America > Canada (0.04)
- Asia > Middle East
  - Jordan (0.04)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning (1.00)
  - Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
Neural Temporal-Difference Learning Converges to Global Optima

Similar Docs Excel Report more

Title	Similarity	Source
None found