Neural Temporal-Difference Learning Converges to Global Optima

Cai, Qi, Yang, Zhuoran, Lee, Jason D., Wang, Zhaoran

Mar-19-2020, 01:16:25 GMT–Neural Information Processing Systems

Temporal-difference learning (TD), coupled with neural networks, is among the most fundamental building blocks of deep reinforcement learning. However, due to the nonlinearity in value function approximation, such a coupling leads to nonconvexity and even divergence in optimization. As a result, the global convergence of neural TD remains unclear. In this paper, we prove for the first time that neural TD converges at a sublinear rate to the global optimum of the mean-squared projected Bellman error for policy evaluation. In particular, we show how such global convergence is enabled by the overparametrization of neural networks, which also plays a vital role in the empirical success of neural TD.

global convergence, global optima, neural temporal-difference learning converge, (2 more...)

Neural Information Processing Systems

Mar-19-2020, 01:16:25 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)