Transformer-based WorkingMemoryforMultiagent ReinforcementLearningwithActionParsing

Neural Information Processing Systems 

Learning in real-world multiagent tasks is challenging due to the usual partial observability ofeach agent. Previous efforts alleviate thepartial observability by historical hidden states with Recurrent Neural Networks, however, they do not consider themultiagent characters thateither themultiagent observationconsists ofanumber ofobject entities orthe action space shows clear entity interactions.