Neural Temporal-Difference Learning Converges to Global Optima

Qi Cai, Zhuoran Yang, Jason D. Lee, Zhaoran Wang

Neural Information Processing Systems 

TD converges at a sublinear rate to the global optimum of the mean-squared projected Bellman error for policy evaluation.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found