Balakrishnan, Hamsa
Asynchronous Cooperative Multi-Agent Reinforcement Learning with Limited Communication
Dolan, Sydney, Nayak, Siddharth, Aloor, Jasmine Jerry, Balakrishnan, Hamsa
Communication is crucial in cooperative multi-agent systems with partial observability, as it enables a better understanding of the environment and improves coordination. In extreme environments such as those underwater or in space, the frequency of communication between agents is often limited [1, 2]. For example, a satellite may not be able to reliably receive and react to messages from other satellites synchronously due to limited onboard power and communication delays. In these scenarios, agents aim to establish a communication protocol that allows them to operate independently while still receiving sufficient information to effectively coordinate with nearby agents. Multi-agent reinforcement learning (MARL) has emerged as a popular approach for addressing cooperative navigation challenges involving multiple agents.
Cooperation and Fairness in Multi-Agent Reinforcement Learning
Aloor, Jasmine Jerry, Nayak, Siddharth, Dolan, Sydney, Balakrishnan, Hamsa
Multi-agent systems are trained to maximize shared cost objectives, which typically reflect system-level efficiency. However, in the resource-constrained environments of mobility and transportation systems, efficiency may be achieved at the expense of fairness -- certain agents may incur significantly greater costs or lower rewards compared to others. Tasks could be distributed inequitably, leading to some agents receiving an unfair advantage while others incur disproportionately high costs. It is important to consider the tradeoffs between efficiency and fairness. We consider the problem of fair multi-agent navigation for a group of decentralized agents using multi-agent reinforcement learning (MARL). We consider the reciprocal of the coefficient of variation of the distances traveled by different agents as a measure of fairness and investigate whether agents can learn to be fair without significantly sacrificing efficiency (i.e., increasing the total distance traveled). We find that by training agents using min-max fair distance goal assignments along with a reward term that incentivizes fairness as they move towards their goals, the agents (1) learn a fair assignment of goals and (2) achieve almost perfect goal coverage in navigation scenarios using only local observations. For goal coverage scenarios, we find that, on average, our model yields a 14% improvement in efficiency and a 5% improvement in fairness over a baseline trained using random assignments. Furthermore, an average of 21% improvement in fairness can be achieved compared to a model trained on optimally efficient assignments; this increase in fairness comes at the expense of only a 7% decrease in efficiency. Finally, we extend our method to environments in which agents must complete coverage tasks in prescribed formations and show that it is possible to do so without tailoring the models to specific formation shapes.
Long-Horizon Planning for Multi-Agent Robots in Partially Observable Environments
Nayak, Siddharth, Orozco, Adelmo Morrison, Have, Marina Ten, Thirumalai, Vittal, Zhang, Jackson, Chen, Darren, Kapoor, Aditya, Robinson, Eric, Gopalakrishnan, Karthik, Harrison, James, Ichter, Brian, Mahajan, Anuj, Balakrishnan, Hamsa
The ability of Language Models (LMs) to understand natural language makes them a powerful tool for parsing human instructions into task plans for autonomous robots. Unlike traditional planning methods that rely on domain-specific knowledge and handcrafted rules, LMs generalize from diverse data and adapt to various tasks with minimal tuning, acting as a compressed knowledge base. However, LMs in their standard form face challenges with long-horizon tasks, particularly in partially observable multi-agent settings. We propose an LM-based Long-Horizon Planner for Multi-Agent Robotics (LLaMAR), a cognitive architecture for planning that achieves state-of-the-art results in long-horizon tasks within partially observable environments. LLaMAR employs a plan-act-correct-verify framework, allowing self-correction from action execution feedback without relying on oracles or simulators. Additionally, we present MAP-THOR, a comprehensive test suite encompassing household tasks of varying complexity within the AI2-THOR environment. Experiments show that LLaMAR achieves a 30% higher success rate compared to other state-of-the-art LM-based multi-agent planners.
Scalable Multi-Agent Reinforcement Learning through Intelligent Information Aggregation
Nayak, Siddharth, Choi, Kenneth, Ding, Wenqi, Dolan, Sydney, Gopalakrishnan, Karthik, Balakrishnan, Hamsa
In such cases, multiple agents may need to work together and share information in order to accomplish the task (Tan, We consider the problem of multi-agent navigation 1993b). Naïve extensions of single-agent RL algorithms and collision avoidance when observations to multi-agent settings do not work well because of the are limited to the local neighborhood of each non-stationarity in the environment, i.e., the actions of one agent. We propose InforMARL, a novel architecture agent affect the actions of others (Tan, 1993a; Tampuu et al., for multi-agent reinforcement learning 2015). Furthermore, tasks may require cooperation among (MARL) which uses local information intelligently the agents. Classical approaches to optimal planning may to compute paths for all the agents in a (1) be computationally intractable, especially for real-time decentralized manner. Specifically, InforMARL applications, and (2) be unable to account for complex interactions aggregates information about the local neighborhood and shared objectives between multiple agents. The of agents for both the actor and the critic ability of RL to learn by trial-and-error makes it well-suited using a graph neural network and can be used in for problems in which optimization-based methods are not conjunction with any standard MARL algorithm.
Satellite Navigation and Coordination with Limited Information Sharing
Dolan, Sydney, Nayak, Siddharth, Balakrishnan, Hamsa
We explore space traffic management as an application of collision-free navigation in multi-agent systems where vehicles have limited observation and communication ranges. We investigate the effectiveness of transferring a collision avoidance multi-agent reinforcement (MARL) model trained on a ground environment to a space one. We demonstrate that the transfer learning model outperforms a model that is trained directly on the space environment. Furthermore, we find that our approach works well even when we consider the perturbations to satellite dynamics caused by the Earth's oblateness. Finally, we show how our methods can be used to evaluate the benefits of information-sharing between satellite operators in order to improve coordination.
NICE: Robust Scheduling through Reinforcement Learning-Guided Integer Programming
Kenworthy, Luke, Nayak, Siddharth, Chin, Christopher, Balakrishnan, Hamsa
Integer programs provide a powerful abstraction for representing a wide range of real-world scheduling problems. Despite their ability to model general scheduling problems, solving large-scale integer programs (IP) remains a computational challenge in practice. The incorporation of more complex objectives such as robustness to disruptions further exacerbates the computational challenge. We present NICE (Neural network IP Coefficient Extraction), a novel technique that combines reinforcement learning and integer programming to tackle the problem of robust scheduling. More specifically, NICE uses reinforcement learning to approximately represent complex objectives in an integer programming formulation. We use NICE to determine assignments of pilots to a flight crew schedule so as to reduce the impact of disruptions. We compare NICE with (1) a baseline integer programming formulation that produces a feasible crew schedule, and (2) a robust integer programming formulation that explicitly tries to minimize the impact of disruptions. Our experiments show that, across a variety of scenarios, NICE produces schedules resulting in 33\% to 48\% fewer disruptions than the baseline formulation. Moreover, in more severely constrained scheduling scenarios in which the robust integer program fails to produce a schedule within 90 minutes, NICE is able to build robust schedules in less than 2 seconds on average.