Reviews: Modelling the Dynamics of Multiagent Q-Learning in Repeated Symmetric Games: a Mean Field Theoretic Approach

Neural Information Processing Systems 

This paper introduces a mean-field model of multiagent Q-learning in repeated symmetric games. The model assumes that at each time step each agent plays symmetric games with m other randomly chosen agents, and considers the limit of n, m to infinity. Under these settings the authors have derived the Fokker-Planck equation governing the time evolution of the distribution of the agents' Q-values. The review scores exhibited quite a large split. Two reviewers rated this paper well above the threshold, whereas Reviewer #1 rated it negatively.