Agents
MAPRO: Recasting Multi-Agent Prompt Optimization as Maximum a Posteriori Inference
Zhang, Zheyuan, Ge, Lin, Li, Hongjiang, Zhu, Weicheng, Zhang, Chuxu, Ye, Yanfang
Large language models (LLMs) have demonstrated remarkable capabilities across diverse tasks, and LLM-based agents further extend these abilities to various practical workflows. While recent progress shows that multi-agent systems (MAS) can outperform single agents by coordinating specialized roles, designing effective MAS remains difficult due to prompt sensitivity and the compounded instability MAS creates. To cope with the challenge, recent efforts in automated prompt design have reduced manual effort. However, multi-agent prompt optimization remains largely unexplored. Challenges like exponentially expanding search space and ambiguous credit assignment together make systematic design intractable without principled methods. Therefore, we introduce M}ulti-Agent PRompt Optimization (MAPRO), a four-stage framework that first formulates MAS prompt optimization as a Maximum a Posteriori (MAP) inference problem and solves it using a language-guided variant of max-product belief propagation algorithm. To address credit assignment and updates the system iteratively, MAPRO employs a topology-aware refinement mechanism that integrates execution feedback and downstream blames to selectively update agent prompts. Through this process, MAPRO progressively converges to a coordinated set of agent-specific prompt policies. Across benchmarks in various tasks, MAPRO achieves state-of-the-art performance, consistently surpassing manually engineered baselines and recent automated alternatives. Beyond performance, our MAP-based formulation also delivers general guidelines for building more reliable and principled multi-agent systems in the future
Network Topology and Information Efficiency of Multi-Agent Systems: Study based on MARL
Zhang, Xinren, Cheng, Sixi, Zhong, Zixin, Yu, Jiadong
Multi-agent systems (MAS) solve complex problems through coordinated autonomous entities with individual decision-making capabilities. While Multi-Agent Reinforcement Learning (MARL) enables these agents to learn intelligent strategies, it faces challenges of non-stationarity and partial observability. Communications among agents offer a solution, but questions remain about its optimal structure and evaluation. This paper explores two underexamined aspects: communication topology and information efficiency. We demonstrate that directed and sequential topologies improve performance while reducing communication overhead across both homogeneous and heterogeneous tasks. Additionally, we introduce two metrics -- Information Entropy Efficiency Index (IEI) and Specialization Efficiency Index (SEI) -- to evaluate message compactness and role differentiation. Incorporating these metrics into training objectives improves success rates and convergence speed. Our findings highlight that designing adaptive communication topologies with information-efficient messaging is essential for effective coordination in complex MAS.
A Survey of Foundation Models for IoT: Taxonomy and Criteria-Based Analysis
Wei, Hui, Lee, Dong Yoon, Rohal, Shubham, Hu, Zhizhang, Rossi, Ryan, Fang, Shiwei, Pan, Shijia
Foundation models have gained growing interest in the IoT domain due to their reduced reliance on labeled data and strong generalizability across tasks, which address key limitations of traditional machine learning approaches. However, most existing foundation model based methods are developed for specific IoT tasks, making it difficult to compare approaches across IoT domains and limiting guidance for applying them to new tasks. This survey aims to bridge this gap by providing a comprehensive overview of current methodologies and organizing them around four shared performance objectives by different domains: efficiency, context-awareness, safety, and security & privacy. For each objective, we review representative works, summarize commonly-used techniques and evaluation metrics. This objective-centric organization enables meaningful cross-domain comparisons and offers practical insights for selecting and designing foundation model based solutions for new IoT tasks. We conclude with key directions for future research to guide both practitioners and researchers in advancing the use of foundation models in IoT applications.
Learning Equilibria from Data: Provably Efficient Multi-Agent Imitation Learning
Freihaut, Till, Viano, Luca, Cevher, Volkan, Geist, Matthieu, Ramponi, Giorgia
This paper provides the first expert sample complexity characterization for learning a Nash equilibrium from expert data in Markov Games. We show that a new quantity named the single policy deviation concentrability coefficient is unavoidable in the non-interactive imitation learning setting, and we provide an upper bound for behavioral cloning (BC) featuring such coefficient. BC exhibits substantial regret in games with high concentrability coefficient, leading us to utilize expert queries to develop and introduce two novel solution algorithms: MAIL-BRO and MURMAIL. The former employs a best response oracle and learns an $\varepsilon$-Nash equilibrium with $\mathcal{O}(\varepsilon^{-4})$ expert and oracle queries. The latter bypasses completely the best response oracle at the cost of a worse expert query complexity of order $\mathcal{O}(\varepsilon^{-8})$. Finally, we provide numerical evidence, confirming our theoretical findings.
Chisme: Fully Decentralized Differentiated Deep Learning for IoT Intelligence
Kuttivelil, Harikrishna, Obraczka, Katia
As end-user device capability increases and demand for intelligent services at the Internet's edge rise, distributed learning has emerged as a key enabling technology. Existing approaches like federated learning (FL) and decentralized FL (DFL) enable distributed learning among clients, while gossip learning (GL) approaches have emerged to address the potential challenges in resource-constrained, connectivity-challenged infrastructure-less environments. However, most distributed learning approaches assume largely homogeneous data distributions and may not consider or exploit the heterogeneity of clients and their underlying data distributions. This paper introduces Chisme, a novel fully decentralized distributed learning algorithm designed to address the challenges of implementing robust intelligence in network edge contexts characterized by heterogeneous data distributions, episodic connectivity, and sparse network infrastructure. Chisme leverages cosine similarity-based data affinity heuristics calculated from received model exchanges to inform how much influence received models have when merging into the local model. By doing so, it facilitates stronger merging influence between clients with more similar model learning progressions, enabling clients to strategically balance between broader collaboration to build more general knowledge and more selective collaboration to build specific knowledge. We evaluate Chisme against contemporary approaches using image recognition and time-series prediction scenarios while considering different network connectivity conditions, representative of real-world distributed intelligent systems. Our experiments demonstrate that Chisme outperforms state-of-the-art edge intelligence approaches in almost every case -- clients using Chisme exhibit faster training convergence, lower final loss after training, and lower performance disparity between clients.
CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards
Xue, Xiangyuan, Zhou, Yifan, Zhang, Guibin, Zhang, Zaibin, Li, Yijiang, Zhang, Chen, Yin, Zhenfei, Torr, Philip, Ouyang, Wanli, Bai, Lei
Self-evolution is a central research topic in enabling large language model (LLM)- based agents to continually improve their capabilities after pretraining. Recent research has witnessed a transition from reinforcement learning (RL)-free to RL-based methods. Current RL-based methods either rely on dense external reward signals or extract intrinsic reward signals from LLMs themselves. However, these approaches diverge from the self-evolution mechanisms observed in human intelligence, where individuals learn and improve through mutual discussion and collaboration. In this work, we introduce Co-Evolving Multi-Agent Systems (CoMAS), a novel framework that enables agents to improve autonomously by learning from inter-agent interactions without external supervision. CoMAS generates intrinsic rewards from rich discussion dynamics, employs an LLM-as-a-judge mechanism to formulate these rewards, and optimizes each agent's policy through RL, thereby enabling decentralized and scalable co-evolution. Experimental results demonstrate that CoMAS consistently outperforms untrained agents and achieves state-of-the-art performance across most evaluation settings. Ablation studies confirm the necessity of interaction-based reward signals and reveal promising scalability as the number and diversity of agents increase. These findings establish CoMAS as a novel and effective paradigm for self-evolution in LLM-based agents. Our dataset and code are available at https://github.com/xxyQwQ/CoMAS. Self-evolution has emerged as a central research theme for large language model (LLM)-based agents, aiming to endow agents with the capacity to continually enhance their capabilities through interaction with the environment (Tao et al., 2024; Gao et al., 2025b; Fang et al., 2025), rather than remaining stagnant after pretraining.
Airy: Reading Robot Intent through Height and Sky
Chen, Baoyang, Xu, Xian, Qu, Huamin
As industrial robots move into shared human spaces, their opaque decision making threatens safety, trust, and public oversight. This artwork, Airy, asks whether complex multi agent AI can become intuitively understandable by staging a competition between two reinforcement trained robot arms that snap a bedsheet skyward. Building on three design principles, competition as a clear metric (who lifts higher), embodied familiarity (audiences recognize fabric snapping), and sensor to sense mapping (robot cooperation or rivalry shown through forest and weather projections), the installation gives viewers a visceral way to read machine intent. Observations from five international exhibitions indicate that audiences consistently read the robots' strategies, conflict, and cooperation in real time, with emotional reactions that mirror the system's internal state. The project shows how sensory metaphors can turn a black box into a public interface.
Chain-of-Trigger: An Agentic Backdoor that Paradoxically Enhances Agentic Robustness
Qiu, Jiyang, Ma, Xinbei, Xu, Yunqing, Zhang, Zhuosheng, Zhao, Hai
The rapid deployment of large language model (LLM)-based agents in real-world applications has raised serious concerns about their trustworthiness. In this work, we reveal the security and robustness vulnerabilities of these agents through backdoor attacks. Distinct from traditional backdoors limited to single-step control, we propose the Chain-of-Trigger Backdoor (CoTri), a multi-step backdoor attack designed for long-horizon agentic control. CoTri relies on an ordered sequence. It starts with an initial trigger, and subsequent ones are drawn from the environment, allowing multi-step manipulation that diverts the agent from its intended task. Experimental results show that CoTri achieves a near-perfect attack success rate (ASR) while maintaining a near-zero false trigger rate (FTR). Due to training data modeling the stochastic nature of the environment, the implantation of CoTri paradoxically enhances the agent's performance on benign tasks and even improves its robustness against environmental distractions. Our work highlights that CoTri achieves stable, multi-step control within agents, improving their inherent robustness and task capabilities, which ultimately makes the attack more stealthy and raises potential safty risks. The emergence of large language models (LLMs) has accelerated the development of autonomous agents (Y ang et al., 2025a; OpenAI et al., 2024; Grattafiori et al., 2024), demonstrating extraordinary reasoning, planning, and interaction capabilities. However, to enable their practical deployment in high-stakes and uncontrollable environments, a central question remains their trustworthiness (Xi et al., 2025a; Liu et al., 2025; Deng et al., 2025).
Quantum Agents for Algorithmic Discovery
Kerenidis, Iordanis, Cherrat, El-Amine
We introduce quantum agents trained by episodic, reward-based reinforcement learning to autonomously rediscover several seminal quantum algorithms and protocols. In particular, our agents learn: efficient logarithmic-depth quantum circuits for the Quantum Fourier Transform; Grover's search algorithm; optimal cheating strategies for strong coin flipping; and optimal winning strategies for the CHSH and other nonlocal games. The agents achieve these results directly through interaction, without prior access to known optimal solutions. This demonstrates the potential of quantum intelligence as a tool for algorithmic discovery, opening the way for the automated design of novel quantum algorithms and protocols.
Mitigating Judgment Preference Bias in Large Language Models through Group-Based Polling
Liu, Shuliang, Xu, Zhipeng, Liu, Zhenghao, Yan, Yukun, Yu, Minghe, Gu, Yu, Chen, Chong, Xie, Huiyuan, Yu, Ge
Large Language Models (LLMs) as automatic evaluators, commonly referred to as LLM-as-a-Judge, have also attracted growing attention. This approach plays a vital role in aligning LLMs with human judgments, providing accurate and reliable assessments. However, LLM-based judgment models often exhibit judgment preference bias during the evaluation phase, tending to favor responses generated by themselves, undermining the reliability of their judgments. This paper introduces the Group-Based Polling Optimization (Genii), an unsupervised multi-agent collaborative optimization framework that mitigates the inherent judgment preference bias of judgment models. Specifically, Genii integrates various LLM-based judgment models into a multi-agent system and simulates the interactive client-server polling mechanism to optimize each client agent unsupervisedly. Our experiments demonstrate that Genii outperforms supervised models trained on annotated judgment data, while requiring no human-labeled annotations. Genii consistently improves performance across different client agents during the polling, even when weaker models act as server agents. Further analysis reveals that Genii effectively mitigates judgment preference bias of LLM-based judgment models, demonstrating its effectiveness. All codes are available at https://github.com/NEUIR/Genii.