Reviews: Credit Assignment For Collective Multiagent RL With Global Rewards
–Neural Information Processing Systems
The paper tackles a multi-agent credit assignment problem, an egregious problem within multi-agent systems by extending existing methods on difference rewards for settings in which the population of the system is large. Though the results are relevant and lead to an improvement for large population systems, the contribution is nonetheless limited to a modification of existing techniques for a specific setting which seemingly requires the number of agents to be large and for the agents to observe a count of the agents within their neighbourhood. The results of the paper enable improved credit assignment in the presence of noise from other agents' actions, an improved baseline leading to reduced variance and, in turn, better estimates of the collective policy gradient (under homogeneity assumptions). The analysis of the paper applies to a specific setting in which the reward function has a term that is common to all agents and therefore is not decomposable. The extent to which this property is to be found in multi-agent systems, however, is not discussed.
Neural Information Processing Systems
Oct-7-2024, 19:56:18 GMT