Agents
Multi-robot coordination for connectivity recovery after unpredictable environment changes
Marchukov, Yaroslav, Montano, Luis
In the present paper we develop a distributed method to reconnect a multi-robot team after connectivity failures, caused by unpredictable environment changes, i.e. appearance of new obstacles. After the changes, the team is divided into different groups of robots. The groups have a limited communication range and only a partial information in their field of view about the current scenario. Their objective is to form a chain from a static base station to a goal location. In the proposed distributed replanning approach, the robots predict new plans for the other groups from the new observed information by each robot in the changed scenario, to restore the connectivity with a base station and reach the initial joint objective. If a solution exists, the method achieves the reconnection of all the groups in a unique chain. The proposed method is compared with other two cases: 1) when all the agents have full information of the environment, and 2) when some robots must move to reach other waiting robots for reconnection. Numerical simulations are provided to evaluate the proposed approach in the presence of unpredictable scenario changes.
Multi-agent coordination for on-demand data gathering with periodic information upload
Marchukov, Yaroslav, Montano, Luis
In this paper we develop a method for planning and coordinating a multi-agent team deployment to periodically gather information on demand. A static operation center (OC) periodically requests information from changing goal locations. The objective is to gather data in the goals and to deliver it to the OC, balancing the refreshing time and the total number of information packages. The system automatically splits the team in two roles: workers to gather data, or collectors to retransmit the data to the OC. The proposed three step method: 1) finds out the best area partition for the workers; 2) obtains the best balance between workers and collectors, and with whom the workers must to communicate, a collector or the OC; 3) computes the best tour for the workers to visit the goals and deliver them to the OC or to a collector in movement. The method is tested in simulations in different scenarios, providing the best area partition algorithm and the best balance between collectors and workers.
ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning
Wan, Ziyu, Li, Yunxiang, Song, Yan, Wang, Hanjing, Yang, Linyi, Schmidt, Mark, Wang, Jun, Zhang, Weinan, Hu, Shuyue, Wen, Ying
Recent research on Reasoning of Large Language Models (LLMs) has sought to further enhance their performance by integrating meta-thinking -- enabling models to monitor, evaluate, and control their reasoning processes for more adaptive and effective problem-solving. However, current single-agent work lacks a specialized design for acquiring meta-thinking, resulting in low efficacy. To address this challenge, we introduce Reinforced Meta-thinking Agents (ReMA), a novel framework that leverages Multi-Agent Reinforcement Learning (MARL) to elicit meta-thinking behaviors, encouraging LLMs to think about thinking. ReMA decouples the reasoning process into two hierarchical agents: a high-level meta-thinking agent responsible for generating strategic oversight and plans, and a low-level reasoning agent for detailed executions. Through iterative reinforcement learning with aligned objectives, these agents explore and learn collaboration, leading to improved generalization and robustness. Experimental results demonstrate that ReMA outperforms single-agent RL baselines on complex reasoning tasks, including competitive-level mathematical benchmarks and LLM-as-a-Judge benchmarks. Comprehensive ablation studies further illustrate the evolving dynamics of each distinct agent, providing valuable insights into how the meta-thinking reasoning process enhances the reasoning capabilities of LLMs.
Learning Closed-Loop Parametric Nash Equilibria of Multi-Agent Collaborative Field Coverage
Chen, Jushan, Paternain, Santiago
Multi-agent reinforcement learning is a challenging and active field of research due to the inherent nonstationary property and coupling between agents. A popular approach to modeling the multi-agent interactions underlying the multi-agent RL problem is the Markov Game. There is a special type of Markov Game, termed Markov Potential Game, which allows us to reduce the Markov Game to a single-objective optimal control problem where the objective function is a potential function. In this work, we prove that a multi-agent collaborative field coverage problem, which is found in many engineering applications, can be formulated as a Markov Potential Game, and we can learn a parameterized closed-loop Nash Equilibrium by solving an equivalent single-objective optimal control problem. As a result, our algorithm is 10x faster during training compared to a game-theoretic baseline and converges faster during policy execution.
Prompt Injection Detection and Mitigation via AI Multi-Agent NLP Frameworks
Gosmar, Diego, Dahl, Deborah A., Gosmar, Dario
Recent advances in generative AI have enabled increasingly sophisticated applications in various domains, from customer service chatbots to automated content generation. However, alongside these advancements, the vulnerability of large language models (LLMs) to adversarial inputs has emerged as a critical concern. Among these, prompt injection attacks pose a particularly insidious challenge, as they exploit the model's inherent instruction-following behavior to override intended constraints. While prompt injection is often discussed in theoretical contexts, its impact on deployed AI systems has been observed in practical settings. Research has demonstrated that even models with reinforced safety mechanisms--or with specific Knowledge based on RAG (Retrieval Augmented Generation)--can be manipulated into disclosing sensitive data, executing unauthorized instructions, or producing harmful content [4].
SPECTra: Scalable Multi-Agent Reinforcement Learning with Permutation-Free Networks
Park, Hyunwoo, Seong, Baekryun, Ko, Sang-Ki
In cooperative multi-agent reinforcement learning (MARL), the permutation problem where the state space grows exponentially with the number of agents reduces sample efficiency. Additionally, many existing architectures struggle with scalability, relying on a fixed structure tied to a specific number of agents, limiting their applicability to environments with a variable number of entities. While approaches such as graph neural networks (GNNs) and self-attention mechanisms have progressed in addressing these challenges, they have significant limitations as dense GNNs and self-attention mechanisms incur high computational costs. To overcome these limitations, we propose a novel agent network and a non-linear mixing network that ensure permutation-equivariance and scalability, allowing them to generalize to environments with various numbers of agents. Our agent network significantly reduces computational complexity, and our scalable hypernetwork enables efficient weight generation for non-linear mixing. Additionally, we introduce curriculum learning to improve training efficiency. Experiments on SMACv2 and Google Research Football (GRF) demonstrate that our approach achieves superior learning performance compared to existing methods. By addressing both permutation-invariance and scalability in MARL, our work provides a more efficient and adaptable framework for cooperative MARL. Our code is available at https://github.com/funny-rl/SPECTra.
RAG-KG-IL: A Multi-Agent Hybrid Framework for Reducing Hallucinations and Enhancing LLM Reasoning through RAG and Incremental Knowledge Graph Learning Integration
This paper presents RAG-KG-IL, a novel multi-agent hybrid framework designed to enhance the reasoning capabilities of Large Language Models (LLMs) by integrating Retrieval-Augmented Generation (RAG) and Knowledge Graphs (KGs) with an Incremental Learning (IL) approach. Despite recent advancements, LLMs still face significant challenges in reasoning with structured data, handling dynamic knowledge evolution, and mitigating hallucinations, particularly in mission-critical domains. Our proposed RAG-KG-IL framework addresses these limitations by employing a multi-agent architecture that enables continuous knowledge updates, integrates structured knowledge, and incorporates autonomous agents for enhanced explainability and reasoning. The framework utilizes RAG to ensure the generated responses are grounded in verifiable information, while KGs provide structured domain knowledge for improved consistency and depth of understanding. The Incremental Learning approach allows for dynamic updates to the knowledge base without full retraining, significantly reducing computational overhead and improving the model's adaptability. We evaluate the framework using real-world case studies involving health-related queries, comparing it to state-of-the-art models like GPT-4o and a RAG-only baseline. Experimental results demonstrate that our approach significantly reduces hallucination rates and improves answer completeness and reasoning accuracy. The results underscore the potential of combining RAG, KGs, and multi-agent systems to create intelligent, adaptable systems capable of real-time knowledge integration and reasoning in complex domains.
Safe Multi-Robotic Arm Interaction via 3D Convex Shapes
Kaypak, Ali Umut, Wei, Shiqing, Krishnamurthy, Prashanth, Khorrami, Farshad
Inter-robot collisions pose a significant safety risk when multiple robotic arms operate in close proximity. We present an online collision avoidance methodology leveraging 3D convex shape-based High-Order Control Barrier Functions (HOCBFs) to address this issue. While prior works focused on using Control Barrier Functions (CBFs) for human-robotic arm and single-arm collision avoidance, we explore the problem of collision avoidance between multiple robotic arms operating in a shared space. In our methodology, we utilize the proposed HOCBFs as centralized and decentralized safety filters. These safety filters are compatible with any nominal controller and ensure safety without significantly restricting the robots' workspace. A key challenge in implementing these filters is the computational overhead caused by the large number of safety constraints and the computation of a Hessian matrix per constraint. We address this challenge by employing numerical differentiation methods to approximate computationally intensive terms. The effectiveness of our method is demonstrated through extensive simulation studies and real-world experiments with Franka Research 3 robotic arms.
Research Vision: Multi-Agent Path Planning for Cops And Robbers Via Reactive Synthesis
Fishell, William, Rodriguez, Andoni, Santolucito, Mark
Reactive synthesis is classically modeled as a game, though often applied to domains such as arbiter circuits and communication protocols [1]. We aim to show how reactive synthesis can be applied to a literal game - cops and robbers - to generate strategies for agents in the game. We propose a game that requires the coordination of multiple agents in a space of datatypes and operations that are richer than is easily captured by the traditional Linear Temporal Logic (LTL) approach of synthesis over Boolean streams [2]. In particular, we draw inspiration from prior work on Coordination Synthesis [3], LTL moduluo theories (LTLt) [4], and Temporal Stream Logic Moduluo theories (TSL-MT) [5, 6] to describe our problem and potential solution spaces. The traditional game [7] asks whether K cops can catch a single robber on a graph. In a temporal logic setting, this amounts to a safety condition on the robbers (they are never caught by the cops), and the dual liveness condition for the cops (they eventually catch the robbers). We modify the traditional graph theory focused version of the game to have a more visual game on a grid system, allowing for various configurations, including: An environment with various node types such as walls, safe zones, and open spaces.
Deep Learning Agents Trained For Avoidance Behave Like Hawks And Doves
Reddi, Aryaman, Vinnicombe, Glenn
We present heuristically optimal strategies expressed by deep learning agents playing a simple avoidance game. We analyse the learning and behaviour of two agents within a symmetrical grid world that must cross paths to reach a target destination without crashing into each other or straying off of the grid world in the wrong direction. The agent policy is determined by one neural network that is employed in both agents. Our findings indicate that the fully trained network exhibits behaviour similar to that of the game Hawks and Doves, in that one agent employs an aggressive strategy to reach the target while the other learns how to avoid the aggressive agent.