Learning Distinguishable Trajectory Representation with Contrastive Loss Tianxu Li1,2 Juan Li1 Yang Zhang
–Neural Information Processing Systems
Policy network parameter sharing is a commonly used technique in advanced deep multi-agent reinforcement learning (MARL) algorithms to improve learning efficiency by reducing the number of policy parameters and sharing experiences among agents. Nevertheless, agents that share the policy parameters tend to learn similar behaviors. To encourage multi-agent diversity, prior works typically maximize the mutual information between trajectories and agent identities using variational inference. However, this category of methods easily leads to inefficient exploration due to limited trajectory visitations. To resolve this limitation, inspired by the learning of pre-trained models, in this paper, we propose a novel Contrastive Trajectory Representation (CTR) method based on learning distinguishable trajectory representations to encourage multi-agent diversity.
Neural Information Processing Systems
May-30-2025, 02:44:15 GMT
- Country:
- North America > Canada (0.28)
- Genre:
- Research Report > Experimental Study (0.93)
- Industry:
- Government > Military (0.46)
- Information Technology (0.67)
- Leisure & Entertainment > Games (0.47)