Q-Learning for Mean-Field Controls
Gu, Haotian, Guo, Xin, Wei, Xiaoli, Xu, Renyuan
Multi-agent reinforcement learning (MARL) has been applied to many challenging problems including two-team computer games, autonomous drivings, and real-time biddings. Despite the empirical success, there is a conspicuous absence of theoretical study of different MARL algorithms: this is mainly due to the curse of dimensionality caused by the exponential growth of the joint state-action space as the number of agents increases. Mean-field controls (MFC) with infinitely many agents and deterministic flows, meanwhile, provide good approximations to $N$-agent collaborative games in terms of both game values and optimal strategies. In this paper, we study the collaborative MARL under an MFC approximation framework: we develop a model-free kernel-based Q-learning algorithm (CDD-Q) and show that its convergence rate and sample complexity are independent of the number of agents. Our empirical studies on MFC examples demonstrate strong performances of CDD-Q. Moreover, the CDD-Q algorithm can be applied to a general class of Markov decision problems (MDPs) with deterministic dynamics and continuous state-action space.
Feb-10-2020
- Country:
- North America
- United States > California
- Los Angeles County > Los Angeles (0.14)
- Santa Clara County > Stanford (0.04)
- Alameda County > Berkeley (0.04)
- Canada > Ontario
- Toronto (0.04)
- United States > California
- Europe > United Kingdom
- England > Oxfordshire > Oxford (0.14)
- North America
- Genre:
- Research Report (0.40)
- Industry:
- Information Technology (0.87)
- Leisure & Entertainment > Games (0.87)
- Transportation > Ground
- Road (0.87)
- Technology: