Goto

Collaborating Authors

 scalable multi-agent reinforcement learning


Scalable Multi-Agent Reinforcement Learning for Networked Systems with Average Reward

Neural Information Processing Systems

It has long been recognized that multi-agent reinforcement learning (MARL) faces significant scalability issues due to the fact that the size of the state and action spaces are exponentially large in the number of agents. In this paper, we identify a rich class of networked MARL problems where the model exhibits a local dependence structure that allows it to be solved in a scalable manner. Specifically, we propose a Scalable Actor-Critic (SAC) method that can learn a near optimal localized policy for optimizing the average reward with complexity scaling with the state-action space size of local neighborhoods, as opposed to the entire network. Our result centers around identifying and exploiting an exponential decay property that ensures the effect of agents on each other decays exponentially fast in their graph distance.


Review for NeurIPS paper: Scalable Multi-Agent Reinforcement Learning for Networked Systems with Average Reward

Neural Information Processing Systems

Strengths: The novelty of the paper is to provide a scalable learning method for average reward settings with guarantee of small performance loss. The work is relevant to a number of real world applications such as social networks, communication networks, transportation networks etc. Following are the highlights of the paper - The problem formulation is clear and despite having so many variables in the proofs, the mathematical notations are wisely chosen and are unambiguous. The proofs appears to be correct and I liked the way few assumptions have been used to provide theoretical guarantees. Overall I think the paper should be accepted for publication.


Review for NeurIPS paper: Scalable Multi-Agent Reinforcement Learning for Networked Systems with Average Reward

Neural Information Processing Systems

The paper has been extensively discussed and reviewers agree the paper has merit and the rebuttal brings a lot of clarification on a number of questions identified by the reviewers (e.g. the difference between underlying framework of the proposed method and that of mean field RL). General consensus is to propose acceptance of the paper; reviewers would like the authors to clarify the following in the paper though: In difference to their claim, Theorem 2 does not really depend on Theorem 1, as it only assumes the exponential decay property, which Theorem 1 only widens.


Scalable Multi-Agent Reinforcement Learning for Networked Systems with Average Reward

Neural Information Processing Systems

It has long been recognized that multi-agent reinforcement learning (MARL) faces significant scalability issues due to the fact that the size of the state and action spaces are exponentially large in the number of agents. In this paper, we identify a rich class of networked MARL problems where the model exhibits a local dependence structure that allows it to be solved in a scalable manner. Specifically, we propose a Scalable Actor-Critic (SAC) method that can learn a near optimal localized policy for optimizing the average reward with complexity scaling with the state-action space size of local neighborhoods, as opposed to the entire network. Our result centers around identifying and exploiting an exponential decay property that ensures the effect of agents on each other decays exponentially fast in their graph distance.


Scalable Multi-Agent Reinforcement Learning for Warehouse Logistics with Robotic and Human Co-Workers

Krnjaic, Aleksandar, Steleac, Raul D., Thomas, Jonathan D., Papoudakis, Georgios, Schäfer, Lukas, To, Andrew Wing Keung, Lao, Kuan-Ho, Cubuktepe, Murat, Haley, Matthew, Börsting, Peter, Albrecht, Stefano V.

arXiv.org Artificial Intelligence

We envision a warehouse in which dozens of mobile robots and human pickers work together to collect and deliver items within the warehouse. The fundamental problem we tackle, called the order-picking problem, is how these worker agents must coordinate their movement and actions in the warehouse to maximise performance (e.g. order throughput). Established industry methods using heuristic approaches require large engineering efforts to optimise for innately variable warehouse configurations. In contrast, multi-agent reinforcement learning (MARL) can be flexibly applied to diverse warehouse configurations (e.g. size, layout, number/types of workers, item replenishment frequency), as the agents learn through experience how to optimally cooperate with one another. We develop hierarchical MARL algorithms in which a manager assigns goals to worker agents, and the policies of the manager and workers are co-trained toward maximising a global objective (e.g. pick rate). Our hierarchical algorithms achieve significant gains in sample efficiency and overall pick rates over baseline MARL algorithms in diverse warehouse configurations, and substantially outperform two established industry heuristics for order-picking systems.


Scalable Multi-Agent Reinforcement Learning through Intelligent Information Aggregation

Nayak, Siddharth, Choi, Kenneth, Ding, Wenqi, Dolan, Sydney, Gopalakrishnan, Karthik, Balakrishnan, Hamsa

arXiv.org Artificial Intelligence

In such cases, multiple agents may need to work together and share information in order to accomplish the task (Tan, We consider the problem of multi-agent navigation 1993b). Naïve extensions of single-agent RL algorithms and collision avoidance when observations to multi-agent settings do not work well because of the are limited to the local neighborhood of each non-stationarity in the environment, i.e., the actions of one agent. We propose InforMARL, a novel architecture agent affect the actions of others (Tan, 1993a; Tampuu et al., for multi-agent reinforcement learning 2015). Furthermore, tasks may require cooperation among (MARL) which uses local information intelligently the agents. Classical approaches to optimal planning may to compute paths for all the agents in a (1) be computationally intractable, especially for real-time decentralized manner. Specifically, InforMARL applications, and (2) be unable to account for complex interactions aggregates information about the local neighborhood and shared objectives between multiple agents. The of agents for both the actor and the critic ability of RL to learn by trial-and-error makes it well-suited using a graph neural network and can be used in for problems in which optimization-based methods are not conjunction with any standard MARL algorithm.