Convergence of Actor-Critic Methods with Multi-Layer Neural Networks
–Neural Information Processing Systems
The early theory of actor-critic methods considered convergence using linear function approximators for the policy and value functions. Recent work has established convergence using neural network approximators with a single hidden layer. In this work we are taking the natural next step and establish convergence using deep neural networks with an arbitrary number of hidden layers, thus closing a gap between theory and practice. We show that actor-critic updates projected on a ball around the initial condition will converge to a neighborhood where the average of the squared gradients is O (1 / m) + O (ϵ), with m being the width of the neural network and ϵ the approximation quality of the best critic neural network over the projected set.
Neural Information Processing Systems
Nov-14-2025, 03:23:41 GMT
- Country:
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America > United States
- Massachusetts > Suffolk County > Boston (0.04)
- Europe > United Kingdom
- Technology: