A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning

Neural Information Processing Systems 

In this paper, we first observe that policies learned using InRL can overfit to the other agents' policies during training, failing to sufficiently generalize during execution. We introduce a new metric, joint-policy correlation, to quantify this effect.