Goto

Collaborating Authors

 easy 100


SupplementaryMaterialsof TheSurprisingEffectivenessofPPOinCooperative Multi-AgentGames

Neural Information Processing Systems

We consider the 3 fully cooperative tasks from the original set shown in Figure 1(a):Spread, Comm,andReference. "Use feature normalization" refers to whether the feature normalization is applied to the networkinput. In this appendix section, we include results which demonstrate the benefit of parameter sharing. Note that our global state to the value network has agent-specific information, such as available actions and relative distances to other agents. When an agent dies, these agent-specific features become zero, while the remaining agent-agnostic features remain nonzero -this leads to adrastic distribution shift in the critic input compared to states in which the agent is alive.


Multi-AgentReinforcementLearningis ASequenceModelingProblem

Neural Information Processing Systems

Recently, such difficulty in multi-agent learning has been eased owing to the introduction ofcentralized training for decentralized execution(CTDE) [11, 45], which allows agents to access the global information andopponents' actions during thetraining phase.


Supplementary Materials of The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games

Neural Information Processing Systems

We assume here that all agents share critic and actor networks, for notational convenience. Gaussian Distribution, from which an action is sampled, in continuous action spaces. In the loss functions above, B refers to the batch size and n refers to the number of agents. Multi-agent Particle-World Environment (MPE) was introduced in (Lowe et al., 2017). StarCraftII Micromanagement Challenge (SMAC) tasks were introduced in (Rashid et al., 2019).