Goto

Collaborating Authors

 robust multi-agent reinforcement learning


Distributionally Robust Multi-Agent Reinforcement Learning for Dynamic Chute Mapping

Liu, Guangyi, Iloglu, Suzan, Caldara, Michael, Durham, Joseph W., Zavlanos, Michael M.

arXiv.org Artificial Intelligence

In Amazon robotic sortation warehouses, mobile robots are deployed to transport and sort packages efficiently to different destinations [1, 2, 3, 4, 5]. The sorting process begins at induction stations, where packages are loaded onto mobile robots and subsequently transported to designated eject chutes based on their destinations (Figure 1). A critical factor determining the package throughput capacity of these facilities is the effective allocation of eject chutes to different destinations. Therefore, the destination-to-chute mapping policy plays a crucial role in optimizing the overall throughput performance of the robotic sortation warehouse. Our previous work [6] addresses the destination assignment problem (DAP) [7] in robotic sorting systems by developing a dynamic chute mapping policy. This policy determines the optimal allocation of eject chutes to destinations with the objective of minimizing the number of unsorted packages. We proposed a model-free reinforcement learning approach that dynamically adjusts the number of chutes assigned to each destination throughout the day. Our solution formulates the chute mapping problem within a Multi-Agent Reinforcement Learning (MARL) framework [8, 9, 10, 11], where each destination is represented as an agent that controls its chute allocation at each time step.


Review for NeurIPS paper: Robust Multi-Agent Reinforcement Learning with Model Uncertainty

Neural Information Processing Systems

Weaknesses: - The biggest weakness of this paper in my mind is the clarity and framing. The paper motivates the contribution by stating that agents may not have access to the reward functions / models of other agents. For example, the paper states: "In many practical applications, the agents may not have perfect information of the model, i.e., the reward function and/or the transition probability model. For example, in an urban traffic network that involves multiple self-driving cars, each vehicle makes an individual action and has no access to other cars' rewards and models." However, most MARL methods don't make any assumptions about the reward function of other agents, particularly in the decentralized MARL setting.


Review for NeurIPS paper: Robust Multi-Agent Reinforcement Learning with Model Uncertainty

Neural Information Processing Systems

The authors' feedback resolved some of the concerns raised by the reviewers. Unfortunately, we have not been able to reach a consensus, so this paper is borderline. On the positive side, the paper introduces a new and interesting MARL framework and provides both theoretical and practical contributions. On the negative side, the authors should better highlight from the beginning what is already present in the state of the art and where their contributions start from, and they should provide a more extensive and accurate empirical evaluation. The requested changes are quite significant, but given the authors' rebuttal, I feel they can fix these issues and so I suggest acceptance.


Robust Multi-Agent Reinforcement Learning via Adversarial Regularization: Theoretical Foundation and Stable Algorithms

Neural Information Processing Systems

Multi-Agent Reinforcement Learning (MARL) has shown promising results across several domains. Despite this promise, MARL policies often lack robustness and are therefore sensitive to small changes in their environment. This presents a serious concern for the real world deployment of MARL algorithms, where the testing environment may slightly differ from the training environment. In this work we show that we can gain robustness by controlling a policy's Lipschitz constant, and under mild conditions, establish the existence of a Lipschitz and close-to-optimal policy. Motivated by these insights, we propose a new robust MARL framework, ERNIE, that promotes the Lipschitz continuity of the policies with respect to the state observations and actions by adversarial regularization.


Robust Multi-Agent Reinforcement Learning with Model Uncertainty

Neural Information Processing Systems

In this work, we study the problem of multi-agent reinforcement learning (MARL) with model uncertainty, which is referred to as robust MARL. This is naturally motivated by some multi-agent applications where each agent may not have perfectly accurate knowledge of the model, e.g., all the reward functions of other agents. Little a priori work on MARL has accounted for such uncertainties, neither in problem formulation nor in algorithm design. In contrast, we model the problem as a robust Markov game, where the goal of all agents is to find policies such that no agent has the incentive to deviate, i.e., reach some equilibrium point, which is also robust to the possible uncertainty of the MARL model. We first introduce the solution concept of robust Nash equilibrium in our setting, and develop a Q-learning algorithm to find such equilibrium policies, with convergence guarantees under certain conditions.


Breaking the Curse of Multiagency in Robust Multi-Agent Reinforcement Learning

Shi, Laixi, Gai, Jingchu, Mazumdar, Eric, Chi, Yuejie, Wierman, Adam

arXiv.org Machine Learning

Standard multi-agent reinforcement learning (MARL) algorithms are vulnerable to sim-to-real gaps. To address this, distributionally robust Markov games (RMGs) have been proposed to enhance robustness in MARL by optimizing the worst-case performance when game dynamics shift within a prescribed uncertainty set. Solving RMGs remains under-explored, from problem formulation to the development of sample-efficient algorithms. A notorious yet open challenge is if RMGs can escape the curse of multiagency, where the sample complexity scales exponentially with the number of agents. In this work, we propose a natural class of RMGs where the uncertainty set of each agent is shaped by both the environment and other agents' strategies in a best-response manner. We first establish the well-posedness of these RMGs by proving the existence of game-theoretic solutions such as robust Nash equilibria and coarse correlated equilibria (CCE). Assuming access to a generative model, we then introduce a sample-efficient algorithm for learning the CCE whose sample complexity scales polynomially with all relevant parameters. To the best of our knowledge, this is the first algorithm to break the curse of multiagency for RMGs.


Safe and Robust Multi-Agent Reinforcement Learning for Connected Autonomous Vehicles under State Perturbations

Zhang, Zhili, Sun, Yanchao, Huang, Furong, Miao, Fei

arXiv.org Artificial Intelligence

Sensing and communication technologies have enhanced learning-based decision making methodologies for multi-agent systems such as connected autonomous vehicles (CAV). However, most existing safe reinforcement learning based methods assume accurate state information. It remains challenging to achieve safety requirement under state uncertainties for CAVs, considering the noisy sensor measurements and the vulnerability of communication channels. In this work, we propose a Robust Multi-Agent Proximal Policy Optimization with robust Safety Shield (SR-MAPPO) for CAVs in various driving scenarios. Both robust MARL algorithm and control barrier function (CBF)-based safety shield are used in our approach to cope with the perturbed or uncertain state inputs. The robust policy is trained with a worst-case Q function regularization module that pursues higher lower-bounded reward in the former, whereas the latter, i.e., the robust CBF safety shield accounts for CAVs' collision-free constraints in complicated driving scenarios with even perturbed vehicle state information. We validate the advantages of SR-MAPPO in robustness and safety and compare it with baselines under different driving and state perturbation scenarios in CARLA simulator. The SR-MAPPO policy is verified to maintain higher safety rates and efficiency (reward) when threatened by both state perturbations and unconnected vehicles' dangerous behaviors.