Reviews: Learning to Communicate with Deep Multi-Agent Reinforcement Learning

Neural Information Processing Systems 

Though the paper contains a very thorough experimental evaluation of the suggested DIAL technique for multi-agent settings and the reviewer understands that it would have taken a lot of time and effort to set up and evaluate the experiments, the paper does not make a very novel contribution. It is clear that idea of shared memory and passing message gradients between agents would speed up learning and help to find the optimal policy faster, but this might not be a very natural way to do it. For instance, humans working in teams do not have shared memories. Also, for humans, messages from other humans are a part of their observation at each time step, rather than separate signals which are treated specially as messages and optimized differently than the rest of the observation. The idea of passing message gradients is certainly useful to have trainable message protocols while training a set of agents to perform a repetitive task, but doesn't offer much insight or useful interpretation as to how humans perform tasks in teams.