Goto

Collaborating Authors

 Aloor, Jasmine Jerry


Asynchronous Cooperative Multi-Agent Reinforcement Learning with Limited Communication

arXiv.org Artificial Intelligence

Communication is crucial in cooperative multi-agent systems with partial observability, as it enables a better understanding of the environment and improves coordination. In extreme environments such as those underwater or in space, the frequency of communication between agents is often limited [1, 2]. For example, a satellite may not be able to reliably receive and react to messages from other satellites synchronously due to limited onboard power and communication delays. In these scenarios, agents aim to establish a communication protocol that allows them to operate independently while still receiving sufficient information to effectively coordinate with nearby agents. Multi-agent reinforcement learning (MARL) has emerged as a popular approach for addressing cooperative navigation challenges involving multiple agents.


Cooperation and Fairness in Multi-Agent Reinforcement Learning

arXiv.org Artificial Intelligence

Multi-agent systems are trained to maximize shared cost objectives, which typically reflect system-level efficiency. However, in the resource-constrained environments of mobility and transportation systems, efficiency may be achieved at the expense of fairness -- certain agents may incur significantly greater costs or lower rewards compared to others. Tasks could be distributed inequitably, leading to some agents receiving an unfair advantage while others incur disproportionately high costs. It is important to consider the tradeoffs between efficiency and fairness. We consider the problem of fair multi-agent navigation for a group of decentralized agents using multi-agent reinforcement learning (MARL). We consider the reciprocal of the coefficient of variation of the distances traveled by different agents as a measure of fairness and investigate whether agents can learn to be fair without significantly sacrificing efficiency (i.e., increasing the total distance traveled). We find that by training agents using min-max fair distance goal assignments along with a reward term that incentivizes fairness as they move towards their goals, the agents (1) learn a fair assignment of goals and (2) achieve almost perfect goal coverage in navigation scenarios using only local observations. For goal coverage scenarios, we find that, on average, our model yields a 14% improvement in efficiency and a 5% improvement in fairness over a baseline trained using random assignments. Furthermore, an average of 21% improvement in fairness can be achieved compared to a model trained on optimally efficient assignments; this increase in fairness comes at the expense of only a 7% decrease in efficiency. Finally, we extend our method to environments in which agents must complete coverage tasks in prescribed formations and show that it is possible to do so without tailoring the models to specific formation shapes.


Follow The Rules: Online Signal Temporal Logic Tree Search for Guided Imitation Learning in Stochastic Domains

arXiv.org Artificial Intelligence

Seamlessly integrating rules in Learning-from-Demonstrations (LfD) policies is a critical requirement to enable the real-world deployment of AI agents. Recently, Signal Temporal Logic (STL) has been shown to be an effective language for encoding rules as spatio-temporal constraints. This work uses Monte Carlo Tree Search (MCTS) as a means of integrating STL specification into a vanilla LfD policy to improve constraint satisfaction. We propose augmenting the MCTS heuristic with STL robustness values to bias the tree search towards branches with higher constraint satisfaction. While the domain-independent method can be applied to integrate STL rules online into any pre-trained LfD algorithm, we choose goal-conditioned Generative Adversarial Imitation Learning as the offline LfD policy. We apply the proposed method to the domain of planning trajectories for General Aviation aircraft around a non-towered airfield. Results using the simulator trained on real-world data showcase 60% improved performance over baseline LfD methods that do not use STL heuristics.


Bounded Distance-control for Multi-UAV Formation Safety and Preservation in Target-tracking Applications

arXiv.org Artificial Intelligence

The notion of safety in multi-agent systems assumes great significance in many emerging collaborative multi-robot applications. In this paper, we present a multi-UAV collaborative target-tracking application by defining bounded inter-UAV distances in the formation in order to ensure safe operation. In doing so, we address the problem of prioritizing specific objectives over others in a multi-objective control framework. We propose a barrier Lyapunov function-based distributed control law to enforce the bounds on the distances and assess its Lyapunov stability using a kinematic model. The theoretical analysis is supported by numerical results, which account for measurement noise and moving targets. Straight-line and circular motion of the target are considered, and results for quadratic Lyapunov function-based control, often used in multi-agent multi-objective problems, are also presented. A comparison of the two control approaches elucidates the advantages of our proposed safe-control in bounding the inter-agent distances in a formation. A concluding evaluation using ROS simulations illustrates the practical applicability of the proposed control to a pair of multi-rotors visually estimating and maintaining their mutual separation within specified bounds, as they track a moving target.