A Other related works

Neural Information Processing Systems 

Let us discuss in passing additional prior works on learning equilibrium solutions in MARL, which have attracted an explosion of interest in recent years. Roughly speaking, previous NE-finding algorithms for two-player zero-sum Markov games can be categorized into model-based algorithms [52, 79, 43], value-based algorithms [4, 5, 73, 54, 31, 15], and policy-based algorithms [10, 22, 71, 82, 14, 81, 11]. In particular, Bai et al. [5], Jin et al. [31] developed the first algorithms to beat the curse of multiple agents in two-player zero-sum MGs, while Jin et al. [31], Daskalakis et al. [23], Mao and Ba sar [44], Song et al. [63] further demonstrated how to accomplish the same goal when learning other computationally tractable solution concepts (e.g., coarse correlated equilibria) in general-sum multi-player Markov games. The recent works Cui and Du [17, 18], Y an et al. [74] studied how to alleviate the sample size scaling with the number of agents in the presence of offline data, with Cui and Du [18] providing a sample-efficient algorithm that also learns NEs in multi-agent Markov games (despite computational intractability). We shall also briefly remark on the prior works that concern RL with a generative model.