Laurière, Mathieu
On Imitation in Mean-field Games
Ramponi, Giorgia, Kolev, Pavel, Pietquin, Olivier, He, Niao, Laurière, Mathieu, Geist, Matthieu
Imitation learning (IL) is a popular framework involving an apprentice agent who learns to imitate the behavior of an expert agent by observing its actions and transitions. In the context of mean-field games (MFGs), IL is used to learn a policy that imitates the behavior of a population of infinitely-many expert agents that are following a Nash equilibrium policy, according to some unknown payoff function. Mean-field games are an approximation introduced to simplify the analysis of games with a large (but finite) number of identical players, where we can look at the interaction between a representative infinitesimal player and a term capturing the population's behavior. The MFG framework enables to scale to an infinite number of agents, where both the reward and the transition are population-dependent. The aim is to learn effective policies that can effectively learn and imitate the behavior of a large population of agents, which is a crucial problem in many real-world applications, such as traffic management [12, 30, 31], crowd control [11, 1], and financial markets [6, 5].
Recent Developments in Machine Learning Methods for Stochastic Control and Games
Hu, Ruimeng, Laurière, Mathieu
Stochastic optimal control and games have found a wide range of applications, from finance and economics to social sciences, robotics and energy management. Many real-world applications involve complex models which have driven the development of sophisticated numerical methods. Recently, computational methods based on machine learning have been developed for stochastic control problems and games. We review such methods, with a focus on deep learning algorithms that have unlocked the possibility to solve such problems even when the dimension is high or when the structure is very complex, beyond what is feasible with traditional numerical methods. Here, we consider mostly the continuous time and continuous space setting. Many of the new approaches build on recent neural-network based methods for high-dimensional partial differential equations or backward stochastic differential equations, or on model-free reinforcement learning for Markov decision processes that have led to breakthrough results. In this paper we provide an introduction to these methods and summarize state-of-the-art works on machine learning for stochastic control and games.
Actor-Critic learning for mean-field control in continuous time
Frikha, Noufel, Germain, Maximilien, Laurière, Mathieu, Pham, Huyên, Song, Xuanye
We study policy gradient for mean-field control in continuous time in a reinforcement learning setting. By considering randomised policies with entropy regularisation, we derive a gradient expectation representation of the value function, which is amenable to actor-critic type algorithms, where the value functions and the policies are learnt alternately based on observation samples of the state and model-free estimation of the population state distribution, either by offline or online learning. In the linear-quadratic mean-field framework, we obtain an exact parametrisation of the actor and critic functions defined on the Wasserstein space. Finally, we illustrate the results of our algorithms with some numerical experiments on concrete examples.
Deep Learning for Mean Field Optimal Transport
Baudelet, Sebastian, Frénais, Brieuc, Laurière, Mathieu, Machtalay, Amal, Zhu, Yuchen
Mean field control (MFC) problems have been introduced to study social optima in very large populations of strategic agents. The main idea is to consider an infinite population and to simplify the analysis by using a mean field approximation. These problems can also be viewed as optimal control problems for McKean-Vlasov dynamics. They have found applications in a wide range of fields, from economics and finance to social sciences and engineering. Usually, the goal for the agents is to minimize a total cost which consists in the integral of a running cost plus a terminal cost. In this work, we consider MFC problems in which there is no terminal cost but, instead, the terminal distribution is prescribed. We call such problems mean field optimal transport problems since they can be viewed as a generalization of classical optimal transport problems when mean field interactions occur in the dynamics or the running cost function. We propose three numerical methods based on neural networks. The first one is based on directly learning an optimal control. The second one amounts to solve a forward-backward PDE system characterizing the solution. The third one relies on a primal-dual approach. We illustrate these methods with numerical experiments conducted on two families of examples.
Scalable Deep Reinforcement Learning Algorithms for Mean Field Games
Laurière, Mathieu, Perrin, Sarah, Girgin, Sertan, Muller, Paul, Jain, Ayush, Cabannes, Theophile, Piliouras, Georgios, Pérolat, Julien, Élie, Romuald, Pietquin, Olivier, Geist, Matthieu
Mean Field Games (MFGs) have been introduced to efficiently approximate games with very large populations of strategic agents. Recently, the question of learning equilibria in MFGs has gained momentum, particularly using model-free reinforcement learning (RL) methods. One limiting factor to further scale up using RL is that existing algorithms to solve MFGs require the mixing of approximated quantities such as strategies or $q$-values. This is far from being trivial in the case of non-linear function approximation that enjoy good generalization properties, e.g. neural networks. We propose two methods to address this shortcoming. The first one learns a mixed strategy from distillation of historical data into a neural network and is applied to the Fictitious Play algorithm. The second one is an online mixing method based on regularization that does not require memorizing historical data or previous estimates. It is used to extend Online Mirror Descent. We demonstrate numerically that these methods efficiently enable the use of Deep RL algorithms to solve various MFGs. In addition, we show that these methods outperform SotA baselines from the literature.
Mean Field Games Flock! The Reinforcement Learning Way
Perrin, Sarah, Laurière, Mathieu, Pérolat, Julien, Geist, Matthieu, Élie, Romuald, Pietquin, Olivier
We present a method enabling a large number of agents to learn how to flock, which is a natural behavior observed in large populations of animals. This problem has drawn a lot of interest but requires many structural assumptions and is tractable only in small dimensions. We phrase this problem as a Mean Field Game (MFG), where each individual chooses its acceleration depending on the population behavior. Combining Deep Reinforcement Learning (RL) and Normalizing Flows (NF), we obtain a tractable solution requiring only very weak assumptions. Our algorithm finds a Nash Equilibrium and the agents adapt their velocity to match the neighboring flock's average one. We use Fictitious Play and alternate: (1) computing an approximate best response with Deep RL, and (2) estimating the next population distribution with NF. We show numerically that our algorithm learn multi-group or high-dimensional flocking with obstacles.
Scaling up Mean Field Games with Online Mirror Descent
Perolat, Julien, Perrin, Sarah, Elie, Romuald, Laurière, Mathieu, Piliouras, Georgios, Geist, Matthieu, Tuyls, Karl, Pietquin, Olivier
We address scaling up equilibrium computation in Mean Field Games (MFGs) using Online Mirror Descent (OMD). We show that continuous-time OMD provably converges to a Nash equilibrium under a natural and well-motivated set of monotonicity assumptions. This theoretical result nicely extends to multi-population games and to settings involving common noise. A thorough experimental investigation on various single and multi-population MFGs shows that OMD outperforms traditional algorithms such as Fictitious Play (FP). We empirically show that OMD scales up and converges significantly faster than FP by solving, for the first time to our knowledge, examples of MFGs with hundreds of billions states. This study establishes the state-of-the-art for learning in large-scale multi-agent and multi-population games.
Fictitious Play for Mean Field Games: Continuous Time Analysis and Applications
Perrin, Sarah, Perolat, Julien, Laurière, Mathieu, Geist, Matthieu, Elie, Romuald, Pietquin, Olivier
In this paper, we deepen the analysis of continuous time Fictitious Play learning algorithm to the consideration of various finite state Mean Field Game settings (finite horizon, $\gamma$-discounted), allowing in particular for the introduction of an additional common noise. We first present a theoretical convergence analysis of the continuous time Fictitious Play process and prove that the induced exploitability decreases at a rate $O(\frac{1}{t})$. Such analysis emphasizes the use of exploitability as a relevant metric for evaluating the convergence towards a Nash equilibrium in the context of Mean Field Games. These theoretical contributions are supported by numerical experiments provided in either model-based or model-free settings. We provide hereby for the first time converging learning dynamics for Mean Field Games in the presence of common noise.
Approximate Fictitious Play for Mean Field Games
Elie, Romuald, Pérolat, Julien, Laurière, Mathieu, Geist, Matthieu, Pietquin, Olivier
The theory of Mean Field Games (MFG) allows characterizing the Nash equilibria of an infinite number of identical players, and provides a convenient and relevant mathematical framework for the study of games with a large number of agents in interaction. Until very recently, the literature only considered Nash equilibria between fully informed players. In this paper, we focus on the realistic setting where agents with no prior information on the game learn their best response policy through repeated experience. We study the convergence to a (possibly approximate) Nash equilibrium of a fictitious play iterative learning scheme where the best response is approximately computed, typically by a reinforcement learning (RL) algorithm. Notably, we show for the first time convergence of model free learning algorithms towards non-stationary MFG equilibria, relying only on classical assumptions on the MFG dynamics. We illustrate our theoretical results with a numerical experiment in continuous action-space setting, where the best response of the iterative fictitious play scheme is computed with a deep RL algorithm.