Goto

Collaborating Authors

 Oceania



SPD: Synergy Pattern Diversifying Oriented Unsupervised Multi-agent Reinforcement Learning

Neural Information Processing Systems

As for the single agent, unsupervised learning has been incorporated into RL to acquire diverse skills for the agent without extrinsic reward from the environment, and this scenario is known as unsupervised reinforcement learning (URL).






Supplementary Material Proof of Proposition

Neural Information Processing Systems

Referring to Eq. (3), we realize that the left side equals H ({i }|S) H ( {i}| S For experiments in section 5.1, we use a batch size of 32 sentences, adam optimizer with a learning rate of 1e-3. We run for 40 epochs and report the test metric at the "best" validation epoch. For experiments in section 5.2, all checkpoints are instances of resnet-50. They are trained by a batch size of 128, and an initial learning rate of 0.1. We run for 200 epochs, with learning rate decay at the 60th, 120th and 160th epoch.