Learning to Communicate
Before an agent takes an action, it observes the communications from other agents from the previous time step as well as the locations of all entities and objects in the world. It stores that communication in a private recurrent neural network, giving it a memory for the words it hears. We use discrete communication actions (messages formed of separate, word-like symbols) sent over a differentiable communication channel. A communication channel is differentiable if it allows agents to directly inform each other about what message they should have sent at each time step, by slightly altering their messages to make a positive change in the reward both agents expect to receive. Agents accomplish this by calculating the gradient of future reward with respect to changes in the sent messages (i.e.
Jan-1-2018, 16:10:59 GMT
- Technology: