e1cf57f1e104c6c05e31894c15a65e99-Supplemental-Conference.pdf

Neural Information Processing Systems 

Here we report both the median test win rate and mean episodic returnwiththe95%confidenceinterval. MAPPOMAA2C hiddendimension 128 128 learningrate 0.0003 0.0005 rewardstandardisation False True networktype MLP/GRU/ATMMLP/GRU/ATM entropycoefficient 0.001 0.01 targetupdate 0.01(soft) 0.01(soft) n-step 5 10 ATM is used as the individual policy network for agents and we here give the detailed network configurationsofATMinTable5. We provide the translation of agent 0's decision process in one battle on 5m_vs_6m as shown in Table7.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found