Agents
Passivity Compensation: A Distributed Approach for Consensus Analysis in Heterogeneous Networks
Su, Yongkang, Khong, Sei Zhen, Su, Lanlan
Abstract-- This paper investigates a passivity-based approach to output consensus analysis in heterogeneous networks com - posed of non-identical agents coupled via nonlinear intera ctions, in the presence of measurement and/or communication noise. Focusing on agents that are input-feedforward passive (IFP), we first examine whether a shortage of passivity in some agents can be compensated by a passivity surplus in others, in the sense of preserving the passivity of the transformed open-l oop system defined by the agent dynamics and network topology. We show that such compensation is only feasible when at most one agent lacks passivity, and we characterise how this defic it can be offset using the excess passivity within the group of agents. For general networks, we then investigate passivit y compensation within the feedback interconnection by lever aging the passivity surplus in the coupling links to locally compe nsate for the lack of passivity in the adjacent agents. In particul ar, a distributed condition, expressed in terms of passivity in dices and coupling gains, is derived to ensure output consensus of the interconnected network.
Nash Q-Network for Multi-Agent Cybersecurity Simulation
Xie, Qintong, Koh, Edward, Cadet, Xavier, Chin, Peter
Cybersecurity defense involves interactions between adversarial parties (namely defenders and hackers), making multi-agent reinforcement learning (MARL) an ideal approach for modeling and learning strategies for these scenarios. This paper addresses one of the key challenges to MARL, the complexity of simultaneous training of agents in nontrivial environments, and presents a novel policy-based Nash Q-learning to directly converge onto a steady equilibrium. We demonstrate the successful implementation of this algorithm in a notable complex cyber defense simulation treated as a two-player zero-sum Markov game setting. We propose the Nash Q-Network, which aims to learn Nash-optimal strategies that translate to robust defenses in cybersecurity settings. Our approach incorporates aspects of proximal policy optimization (PPO), deep Q-network (DQN), and the Nash-Q algorithm, addressing common challenges like non-stationarity and instability in multi-agent learning. The training process employs distributed data collection and carefully designed neural architectures for both agents and critics.
Social World Models
Zhou, Xuhui, Liu, Jiarui, Yerukola, Akhila, Kim, Hyunwoo, Sap, Maarten
Humans intuitively navigate social interactions by simulating unspoken dynamics and reasoning about others' perspectives, even with limited information. In contrast, AI systems struggle to automatically structure and reason about these implicit social contexts. In this paper, we introduce a novel structured social world representation formalism (S3AP), designed to help AI systems reason more effectively about social dynamics. Following a POMDP-driven design, S3AP represents social interactions as structured tuples, such as state, observation, agent actions, and mental states, which can be automatically induced from free-form narratives or other inputs. We first show S3AP can help LLMs better understand social narratives across 5 social reasoning tasks (e.g., +51% improvement on FANToM's theory-of-mind reasoning with OpenAI's o1), reaching new state-of-the-art (SOTA) performance. We then induce social world models from these structured representations, demonstrating their ability to predict future social dynamics and improve agent decision-making, yielding up to +18% improvement on the SOTOPIA social interaction benchmark. Our findings highlight the promise of S3AP as a powerful, general-purpose representation for social world states, enabling the development of more socially-aware systems that better navigate social interactions.
MobiAgent: A Systematic Framework for Customizable Mobile Agents
Zhang, Cheng, Feng, Erhu, Zhao, Xi, Zhao, Yisheng, Gong, Wangbo, Sun, Jiahui, Du, Dong, Hua, Zhichao, Xia, Yubin, Chen, Haibo
With the rapid advancement of Vision-Language Models (VLMs), GUI-based mobile agents have emerged as a key development direction for intelligent mobile systems. However, existing agent models continue to face significant challenges in real-world task execution, particularly in terms of accuracy and efficiency. To address these limitations, we propose MobiAgent, a comprehensive mobile agent system comprising three core components: the MobiMind-series agent models, the AgentRR acceleration framework, and the MobiFlow benchmarking suite. Furthermore, recognizing that the capabilities of current mobile agents are still limited by the availability of high-quality data, we have developed an AI-assisted agile data collection pipeline that significantly reduces the cost of manual annotation. Compared to both general-purpose LLMs and specialized GUI agent models, MobiAgent achieves state-of-the-art performance in real-world mobile scenarios.
Mean-payoff and Energy Discrete Bidding Games
A \emph{bidding} game is played on a graph as follows. A token is placed on an initial vertex and both players are allocated budgets. In each turn, the players simultaneously submit bids that do not exceed their available budgets, the higher bidder moves the token, and pays the bid to the lower bidder. We focus on \emph{discrete}-bidding, which are motivated by practical applications and restrict the granularity of the players' bids, e.g, bids must be given in cents. We study, for the first time, discrete-bidding games with {\em mean-payoff} and {\em energy} objectives. In contrast, mean-payoff {\em continuous}-bidding games (i.e., no granularity restrictions) are understood and exhibit a rich mathematical structure. The {\em threshold} budget is a necessary and sufficient initial budget for winning an energy game or guaranteeing a target payoff in a mean-payoff game. We first establish existence of threshold budgets; a non-trivial property due to the concurrent moves of the players. Moreover, we identify the structure of the thresholds, which is key in obtaining compact strategies, and in turn, showing that finding threshold is in \NP~and \coNP even in succinctly-represented games.
Multi-Agent Data Visualization and Narrative Generation
Wolter, Anton, Vidalakis, Georgios, Yu, Michael, Grover, Ankit, Dhanoa, Vaishali
Recent advancements in the field of AI agents have impacted the way we work, enabling greater automation and collaboration between humans and agents. In the data visualization field, multi-agent systems can be useful for employing agents throughout the entire data-to-communication pipeline. We present a lightweight multi-agent system that automates the data analysis workflow, from data exploration to generating coherent visual narratives for insight communication. Our approach combines a hybrid multi-agent architecture with deterministic components, strategically externalizing critical logic from LLMs to improve transparency and reliability. The system delivers granular, modular outputs that enable surgical modifications without full regeneration, supporting sustainable human-AI collaboration. We evaluated our system across 4 diverse datasets, demonstrating strong generalizability, narrative quality, and computational efficiency with minimal dependencies.
NEWSAGENT: Benchmarking Multimodal Agents as Journalists with Real-World Newswriting Tasks
Chien, Yen-Che, Wang, Kuang-Da, Wang, Wei-Yao, Peng, Wen-Chih
Recent advances in autonomous digital agents from industry (e.g., Manus AI and Gemini's research mode) highlight potential for structured tasks by autonomous decision-making and task decomposition; however, it remains unclear to what extent the agent-based systems can improve multimodal web data productivity. We study this in the realm of journalism, which requires iterative planning, interpretation, and contextual reasoning from multimodal raw contents to form a well structured news. We introduce NEWSAGENT, a benchmark for evaluating how agents can automatically search available raw contents, select desired information, and edit and rephrase to form a news article by accessing core journalistic functions. Given a writing instruction and firsthand data as how a journalist initiates a news draft, agents are tasked to identify narrative perspectives, issue keyword-based queries, retrieve historical background, and generate complete articles. Unlike typical summarization or retrieval tasks, essential context is not directly available and must be actively discovered, reflecting the information gaps faced in real-world news writing. NEWSAGENT includes 6k human-verified examples derived from real news, with multimodal contents converted to text for broad model compatibility. We evaluate open- and closed-sourced LLMs with commonly-used agentic frameworks on NEWSAGENT, which shows that agents are capable of retrieving relevant facts but struggling with planning and narrative integration. We believe that NEWSAGENT serves a realistic testbed for iterating and evaluating agent capabilities in terms of multimodal web data manipulation to real-world productivity.
KG-RAG: Enhancing GUI Agent Decision-Making via Knowledge Graph-Driven Retrieval-Augmented Generation
Guan, Ziyi, Li, Jason Chun Lok, Hou, Zhijian, Zhang, Pingping, Xu, Donglai, Zhao, Yuzhi, Wu, Mengyang, Chen, Jinpeng, Nguyen, Thanh-Toan, Xian, Pengfei, Ma, Wenao, Qin, Shengchao, Chesi, Graziano, Wong, Ngai
Despite recent progress, Graphic User Interface (GUI) agents powered by Large Language Models (LLMs) struggle with complex mobile tasks due to limited app-specific knowledge. While UI Transition Graphs (UTGs) offer structured navigation representations, they are underutilized due to poor extraction and inefficient integration. We introduce KG-RAG, a Knowledge Graph-driven Retrieval-Augmented Generation framework that transforms fragmented UTGs into structured vector databases for efficient real-time retrieval. By leveraging an intent-guided LLM search method, KG-RAG generates actionable navigation paths, enhancing agent decision-making. Experiments across diverse mobile apps show that KG-RAG outperforms existing methods, achieving a 75.8% success rate (8.9% improvement over AutoDroid), 84.6% decision accuracy (8.1% improvement), and reducing average task steps from 4.5 to 4.1. Additionally, we present KG-Android-Bench and KG-Harmony-Bench, two benchmarks tailored to the Chinese mobile ecosystem for future research. Finally, KG-RAG transfers to web/desktop (+40% SR on Weibo-web; +20% on QQ Music-desktop), and a UTG cost ablation shows accuracy saturates at ~4h per complex app, enabling practical deployment trade-offs.
HiVA: Self-organized Hierarchical Variable Agent via Goal-driven Semantic-Topological Evolution
Tang, Jinzhou, Zhang, Jusheng, Lv, Qinhan, Liu, Sidi, Yang, Jing, Tang, Chengpei, Wang, Keze
Autonomous agents play a crucial role in advancing Artificial General Intelligence, enabling problem decomposition and tool orchestration through Large Language Models (LLMs). However, existing paradigms face a critical trade-off. On one hand, reusable fixed workflows require manual reconfiguration upon environmental changes; on the other hand, flexible reactive loops fail to distill reasoning progress into transferable structures. We introduce Hierarchical V ariable Agent (HiV A), a novel framework modeling agentic workflows as self-organized graphs with the Semantic-Topological Evolution (STEV) algorithm, which optimizes hybrid semantic-topological spaces using textual gradients as discrete-domain surrogates for backpropagation. The iterative process comprises Multi-Armed Bandit-infused forward routing, diagnostic gradient generation from environmental feedback, and coordinated updates that co-evolve individual semantics and topology for collective optimization in unknown environments. Experiments on dialogue, coding, Long-context Q&A, mathematical, and agentic benchmarks demonstrate improvements of 5-10% in task accuracy and enhanced resource efficiency over existing baselines, establishing HiV A's effectiveness in autonomous task execution.
Virtual Group Knowledge and Group Belief in Topological Evidence Models (Extended Version)
Baltag, Alexandru, Gattinger, Malvin, Gomes, Djanira
We study notions of (virtual) group knowledge and group belief within multi-agent evidence models, obtained by extending the topological semantics of evidence-based belief and fallible knowledge from individuals to groups. We completely axiomatize and show the decidability of the logic of ("hard" and "soft") group evidence, and do the same for an especially interesting fragment of it: the logic of group knowledge and group belief. We also extend these languages with dynamic evidence-sharing operators, and completely axiomatize the corresponding logics, showing that they are co-expressive with their static bases.