Reviews: A Structured Prediction Approach for Generalization in Cooperative Multi-Agent Reinforcement Learning

Neural Information Processing Systems 

This paper proposes a new multi-task hierarchical reinforcement learning algorithm. The high-level policy achieves the assignment of tasks by solving a linear programming problem(or a quadratic programming problem), and the low-level policy is pre-defined. The biggest contribution of this paper is to get rid of the limitation of the number of agents and the number of tasks by modeling the multi-task assignment problem as an optimization problem, which based on the correlation between the agent and the task and the correlation between the tasks. After training the correlation in a simple task, you only need to re-solve the optimization problem in the complex task, without retraining, thus achieving zero-shot generalization. In this paper, the collaboration patterns between agents in the multi-task problem, such as creating subgroups of agents or spreading agents across tasks at the same time, are transformed into constraints to be added to the optimization problem corresponding to the high-level policy.