Neural Temporal-Difference Learning Converges to Global Optima
Qi Cai, Zhuoran Yang, Jason D. Lee, Zhaoran Wang
–Neural Information Processing Systems
TD converges at a sublinear rate to the global optimum of the mean-squared projected Bellman error for policy evaluation.
Neural Information Processing Systems
Feb-13-2026, 02:18:02 GMT