Appendix Gigastep - One Billion Steps per Second Multi-agent Reinforcement Learning

Neural Information Processing Systems 

In this section, we train policies for different scenarios to validate that the tasks defined in Gigastep can be solved with multi-agent RL algorithms. In particular, we use multi-agent PPO implemented in JAX. In competitive or adversarial MARL, an objective reward measure is not defined, as the collected reward inherently depends on the relative strength of the opposing agent's policy. Therefore, to measure the training progress, we compare the current policy with previous checkpoints of the same policy at earlier training iterations. Specifically, an improving policy should be able to outperform its previous counterparts.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found