SupplementarymaterialforVariationalAutomatic CurriculumLearningforSparse-Reward CooperativeMulti-AgentProblems
–Neural Information Processing Systems
All the source code can be found at our project websitehttps://sites.google.com/view/ The proof is largely based on [2]. The speaker and listener obtain +1 reward when the listener covers the correct landmark. We construct theHard-Spreadscenario by adding walls toseparate the room into three parts. For the tasks in the particle-world environment, we evaluate the performances of our algorithm andbaselines withtheaverage coverage oflandmarks inthelastfiveevaluation stepswithin every episode.
Neural Information Processing Systems
Feb-8-2026, 15:44:56 GMT
- Technology:
- Information Technology (0.68)