Linear-Quadratic Mean-Field Reinforcement Learning: Convergence of Policy Gradient Methods
Carmona, René, Laurière, Mathieu, Tan, Zongjun
–arXiv.org Artificial Intelligence
We investigate reinforcement learning in the setting of Markov decision processes for a large number of exchangeable agents interacting in a mean field manner. Applications include, for example, the control of a large number of robots communicating through a central unit dispatching the optimal policy computed by maximizing an aggregate reward. An approximate solution is obtained by learning the optimal policy of a generic agent interacting with the statistical distribution of the states and actions of the other agents. We first provide a full analysis this discrete-time mean field control problem. We then rigorously prove the convergence of exact and model-free policy gradient methods in a mean-field linear-quadratic setting and establish bounds on the rates of convergence. We also provide graphical evidence of the convergence based on implementations of our algorithms.
arXiv.org Artificial Intelligence
Apr-30-2025
- Country:
- Asia > China
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America
- Canada > Quebec
- Montreal (0.04)
- United States
- New Jersey > Mercer County
- Princeton (0.04)
- New York (0.04)
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- Texas > Coleman County (0.04)
- New Jersey > Mercer County
- Canada > Quebec
- Genre:
- Research Report (0.40)
- Technology: