Goto

Collaborating Authors

 synergy pattern



SPD: Synergy Pattern Diversifying Oriented Unsupervised Multi-agent Reinforcement Learning

Neural Information Processing Systems

Reinforcement learning typically relies heavily on a well-designed reward signal, which gets more challenging in cooperative multi-agent reinforcement learning. Alternatively, unsupervised reinforcement learning (URL) has delivered on its promise in the recent past to learn useful skills and explore the environment without external supervised signals. These approaches mainly aimed for the single agent to reach distinguishable states, insufficient for multi-agent systems due to that each agent interacts with not only the environment, but also the other agents. We propose Synergy Pattern Diversifying Oriented Unsupervised Multi-agent Reinforcement Learning (SPD) to learn generic coordination policies for agents with no extrinsic reward. Specifically, we devise the Synergy Pattern Graph (SPG), a graph depicting the relationships of agents at each time step. Furthermore, we propose an episode-wise divergence measurement to approximate the discrepancy of synergy patterns. To overcome the challenge of sparse return, we decompose the discrepancy of synergy patterns to per-time-step pseudo-reward. Empirically, we show the capacity of SPD to acquire meaningful coordination policies, such as maintaining specific formations in Multi-Agent Particle Environment and pass-and-shoot in Google Research Football. Furthermore, we demonstrate that the same instructive pretrained policy's parameters can serve as a good initialization for a series of downstream tasks' policies, achieving higher data efficiency and outperforming state-of-the-art approaches in Google Research Football.


A Properties of the discrepancy of synergy patterns as a legitimate 1 pseudometric

Neural Information Processing Systems

To avoid unnecessary confusion, we notate the joint distribution of the L.H.S as We show that the infimum of the R.H.S. is reached when Then we can update the joint distribution for the L.H.S. with The start steps for employing SPD to obtain pseudo-reward 5000 α The factor of the regularized term in Eq. (6) 0 B We prove the triangle inequality by contradictions similar to iii). Each agent has to resolve to select the action from its discrete action space to move around. Neural Network (RNN) is used in the policy to alleviate the partial observability. WW W of edge { i, j} depicts agents' relative relations. Synergy Pattern Function ζ A general function which could depict agents' relative relations.


SPD: Synergy Pattern Diversifying Oriented Unsupervised Multi-agent Reinforcement Learning

Neural Information Processing Systems

As for the single agent, unsupervised learning has been incorporated into RL to acquire diverse skills for the agent without extrinsic reward from the environment, and this scenario is known as unsupervised reinforcement learning (URL).


SPD: Synergy Pattern Diversifying Oriented Unsupervised Multi-agent Reinforcement Learning

Neural Information Processing Systems

Reinforcement learning typically relies heavily on a well-designed reward signal, which gets more challenging in cooperative multi-agent reinforcement learning. Alternatively, unsupervised reinforcement learning (URL) has delivered on its promise in the recent past to learn useful skills and explore the environment without external supervised signals. These approaches mainly aimed for the single agent to reach distinguishable states, insufficient for multi-agent systems due to that each agent interacts with not only the environment, but also the other agents. We propose Synergy Pattern Diversifying Oriented Unsupervised Multi-agent Reinforcement Learning (SPD) to learn generic coordination policies for agents with no extrinsic reward. Specifically, we devise the Synergy Pattern Graph (SPG), a graph depicting the relationships of agents at each time step. Furthermore, we propose an episode-wise divergence measurement to approximate the discrepancy of synergy patterns.