A Proofs

Neural Information Processing Systems 

We will prove it by contradiction. To prove Lemma 2 we will use the following lemma. This is a special case of the simulation lemma (Kearns and Singh, 2002). We will prove it by contradiction. There is a sizeable body of literature that concentrates on the non-stationarity issues arising from having multiple agents learning simultaneously in the same environment (Laurent et al., 2011; In contrast, Foerster et al. (2018a) add an extra term to The works by Lowe et al. (2017) and Foerster The works by de Witt et al. (2020) and Y u et al. (2021) show that Y u et al. attribute the positive empirical results to the clipping parameter Global simulator, observation functions, and joint policy for n 0, ...,N/T do s The bar plots show the total runtime of training for 4M timesteps with the three simulators.