Neural Temporal-Difference Learning Converges to Global Optima
Qi Cai, Zhuoran Yang, Jason D. Lee, Zhaoran Wang
–Neural Information Processing Systems
TD converges at a sublinear rate to the global optimum of the mean-squared projected Bellman error for policy evaluation.
Neural Information Processing Systems
Oct-3-2025, 06:53:11 GMT