Appendix APseudocodeofDRE-MARL

Neural Information Processing Systems 

The property of the received reward in this environment isset tobecollaborative. For each agent, we first sample rewardsˆri from estimated reward distributionsDi. NetworkArchitecture. Thedecentralized actors anddistributional rewardestimation networks adopt the simple fully-connected feedforward neural network with three layers in our framework. The two hidden layers' units are 64. The centralized critic uses a graph attention neural network with eight attention heads, and each head'shidden unit isset to8tocapture the dynamic relationship between agents.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found