Review for NeurIPS paper: Learning Implicit Credit Assignment for Cooperative Multi-Agent Reinforcement Learning
–Neural Information Processing Systems
Weaknesses: The first essential issue in LICA algorithm is that the definition of the centralized value-function is not clear. In particular, what exactly is the proposed value function is trying to approximate? During training, this centralized value function is trained conditioned on a sampled joint action (Eq.3), while during policy updating, it is used in a way that conditions on the concatenation of the probability over actions output by each agent's policy. Due to this inconsistency in the input of the value-function, this critic should not be able to provide a correct value-estimation for the stochastic policies when calculating the policy gradient. The paper should give a further explanation and theoretical analysis of this approach.
Neural Information Processing Systems
May-30-2025, 00:13:17 GMT
- Technology: