Weak Convergence Analysis of Online Neural Actor-Critic Algorithms

Lam, Samuel Chun-Hei, Sirignano, Justin, Wang, Ziheng

Mar-25-2024–arXiv.org Machine Learning

Neural network actor-critic algorithms are one of the most popular methods in deep reinforcement learning. A neural network model is trained to select the policy (the "actor") while another neural network (the "critic") is simultaneously trained to learn the value function given the actor's policy. Specifically, the actor selects an action and, given the action, a new state transition occurs according to a Markov stochastic process and a reward (a measurement of the success/failure) is observed. The critic must learn to approximate the value function - the solution to the Bellman equation - given the actor's policy. Given the critic's estimate for the value function of the current policy, the actor must be updated to improve the value function (i.e., increase the expected reward). Actor-critic algorithms are well-established methods in reinforcement learning [17, 15]; the key recent advance is using (deep) neural networks as the actor/critic and training their parameters using gradient descent methods [26, 10, 25, 2, 29]. Analysis of neural network actor-critic algorithms is challenging due to: (1) the non-convexity of the neural networks, (2) the complex feedback loop between the actor and critic (the actor determines the sequence of data samples which are used to train the critic and the critic is used to train the actor), and (3) the simultaneous online updates of both the actor and critic which lead to (3A) the distribution of the data, which depends upon the actor, constantly evolving in time and (3B) the actor being updated with a noisy, biased estimate of the value function.

actor-critic algorithm, algorithm, convergence, (14 more...)

arXiv.org Machine Learning

Mar-25-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Massachusetts (0.04)
- Europe > United Kingdom
  - England > Oxfordshire > Oxford (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Statistical Learning > Gradient Descent (0.34)
  - Neural Networks > Deep Learning (0.34)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found