Agent Societies
Learning to Communicate in Multi-Agent Reinforcement Learning for Autonomous Cyber Defence
Contractor, Faizan, Li, Li, Mallah, Ranwa Al
Popular methods in cooperative Multi-Agent Reinforcement Learning with partially observable environments typically allow agents to act independently during execution, which may limit the coordinated effect of the trained policies. However, by sharing information such as known or suspected ongoing threats, effective communication can lead to improved decision-making in the cyber battle space. We propose a game design where defender agents learn to communicate and defend against imminent cyber threats by playing training games in the Cyber Operations Research Gym, using the Differentiable Inter Agent Learning algorithm adapted to the cyber operational environment. The tactical policies learned by these autonomous agents are akin to those of human experts during incident responses to avert cyber threats. In addition, the agents simultaneously learn minimal cost communication messages while learning their defence tactical policies.
Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual Backpropagation
Ma, Xiaowen, Lin, Chenyang, Zhang, Yao, Tresp, Volker, Ma, Yunpu
Leveraging multiple Large Language Models(LLMs) has proven effective for addressing complex, high-dimensional tasks, but current approaches often rely on static, manually engineered multi-agent configurations. To overcome these constraints, we present the Agentic Neural Network(ANN), a framework that conceptualizes multi-agent collaboration as a layered neural network architecture. In this design, each agent operates as a node, and each layer forms a cooperative "team" focused on a specific subtask. Agentic Neural Network follows a two-phase optimization strategy: (1) Forward Phase-Drawing inspiration from neural network forward passes, tasks are dynamically decomposed into subtasks, and cooperative agent teams with suitable aggregation methods are constructed layer by layer. (2) Backward Phase-Mirroring backpropagation, we refine both global and local collaboration through iterative feedback, allowing agents to self-evolve their roles, prompts, and coordination. This neuro-symbolic approach enables ANN to create new or specialized agent teams post-training, delivering notable gains in accuracy and adaptability. Across four benchmark datasets, ANN surpasses leading multi-agent baselines under the same configurations, showing consistent performance improvements. Our findings indicate that ANN provides a scalable, data-driven framework for multi-agent systems, combining the collaborative capabilities of LLMs with the efficiency and flexibility of neural network principles. We plan to open-source the entire framework.
Coral Protocol: Open Infrastructure Connecting The Internet of Agents
Georgio, Roman J., Forder, Caelum, Deb, Suman, Rahimov, Andri, Carroll, Peter, Gรผrcan, รnder
Coral Protocol is an open and decentralized collaboration infrastructure that enables communication, coordination, trust and payments for The Internet of Agents. It addresses the growing need for interoperability in a world where organizations are deploying multiple specialized AI agents that must work together across domains and vendors. As a foundational platform for multi-agent AI ecosystems, Coral establishes a common language and coordination framework allowing any agent to participate in complex workflows with others. Its design emphasizes broad compatibility, security, and vendor neutrality, ensuring that agent interactions are efficient and trustworthy. In particular, Coral introduces standardized messaging formats for agent communication, a modular coordination mechanism for orchestrating multi-agent tasks, and secure team formation capabilities for dynamically assembling trusted groups of agents. Together, these innovations position Coral Protocol as a cornerstone of the emerging "Internet of Agents," unlocking new levels of automation, collective intelligence, and business value through open agent collaboration.
Seven Security Challenges That Must be Solved in Cross-domain Multi-agent LLM Systems
Ko, Ronny, Jeong, Jiseong, Zheng, Shuyuan, Xiao, Chuan, Kim, Tae-Wan, Onizuka, Makoto, Shin, Won-Yong
Large language models (LLMs) are rapidly evolving into autonomous agents that cooperate across organizational boundaries, enabling joint disaster response, supply-chain optimization, and other tasks that demand decentralized expertise without surrendering data ownership. Yet, cross-domain collaboration shatters the unified trust assumptions behind current alignment and containment techniques. An agent benign in isolation may, when receiving messages from an untrusted peer, leak secrets or violate policy, producing risks driven by emergent multi-agent dynamics rather than classical software bugs. This position paper maps the security agenda for cross-domain multi-agent LLM systems. We introduce seven categories of novel security challenges, for each of which we also present plausible attacks, security evaluation metrics, and future research guidelines.
SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users
Zhang, Xinnong, Lin, Jiayu, Mou, Xinyi, Yang, Shiyue, Liu, Xiawei, Sun, Libo, Lyu, Hanjia, Yang, Yihang, Qi, Weihong, Chen, Yue, Li, Guanying, Yan, Ling, Hu, Yao, Chen, Siming, Wang, Yu, Huang, Xuanjing, Luo, Jiebo, Tang, Shiping, Wu, Libo, Zhou, Baohua, Wei, Zhongyu
Social simulation is transforming traditional social science research by modeling human behavior through interactions between virtual individuals and their environments. With recent advances in large language models (LLMs), this approach has shown growing potential in capturing individual differences and predicting group behaviors. However, existing methods face alignment challenges related to the environment, target users, interaction mechanisms, and behavioral patterns. To this end, we introduce SocioVerse, an LLM-agent-driven world model for social simulation. Our framework features four powerful alignment components and a user pool of 10 million real individuals. To validate its effectiveness, we conducted large-scale simulation experiments across three distinct domains: politics, news, and economics. Results demonstrate that SocioVerse can reflect large-scale population dynamics while ensuring diversity, credibility, and representativeness through standardized procedures and minimal manual adjustments.
AI Agent Architecture for Decentralized Trading of Alternative Assets
Borjigin, Ailiya, He, Cong, Lee, Charles CC, Zhou, Wei
--Decentralized trading of real-world alternative assets (e.g., gold) requires bridging physical asset custody with blockchain systems while meeting strict requirements for compliance, liquidity, and risk management. We present a research-oriented architecture, GoldMine OS, that employs multiple specialized AI agents to automate and secure the tokenization and exchange of physical gold into a blockchain-based stablecoin ("OZ"). We detail the design of four cooperative agents (for Compliance, T oken Issuance, Market-Making, and Risk Control) and a coordinating core, and we evaluate the system through both simulation and a controlled pilot deployment. In experiments, the prototype achieves on-demand token issuance in under 1.2 s, a speed-up of over 100 compared to traditional manual workflows. The integrated Market-Making agent provides tight liquidity (spreads often <0.5%) even under volatile market conditions. Through fault injection tests, we demonstrate the system's resilience: an oracle price spoofing attack is detected and mitigated within 10 s, and a simulated vault mis-reporting triggers an immediate halt of issuances with minimal impact on users. Our results indicate that an AI-agent-based decentralized exchange for alternative assets can meet rigorous performance and safety requirements. We discuss the broader implications for democratizing access to traditionally illiquid assets and outline how our governance model (multi-signature agent updates and on-chain community voting on risk parameters) ensures ongoing transparency, adaptability, and formal assurance of system integrity. Tokenizing real-world assets (RW As) like precious metals on blockchains promises to democratize access to alternative investments, but it raises significant challenges in trust, compliance, and market stability [1] [2]. For instance, gold-backed cryptocurrencies such as P AX Gold (P AXG) and Tether Gold (XAUT) peg digital tokens to physical gold reserves, yet they rely heavily on centralized processes for custody and compliance [2]. Achieving a truly decentralized yet regulatorily compliant trading platform for assets like gold remains an open problem. Key hurdles include ensuring that on-chain token supply always mirrors off-chain reserves (requiring robust proof-of-reserve mechanisms), automating complex compliance checks (KYC/AML) in a user-friendly manner, providing continuous liquidity in thinly-traded assets, and guarding against failures of external data sources (the well-known oracle problem [3]). In this paper, we address these challenges by designing and evaluating GoldMine OS, an AI-driven multi-agent architecture for decentralized trading of gold-backed tokens.
A Learning Framework For Cooperative Collision Avoidance of UAV Swarms Leveraging Domain Knowledge
Huang, Shuangyao, Zhang, Haibo, Huang, Zhiyi
This paper presents a multi-agent reinforcement learning (MARL) framework for cooperative collision avoidance of UA V swarms leveraging domain knowledge-driven reward. The reward is derived from knowledge in the domain of image processing, approximating contours on a two-dimensional field. By modeling obstacles as maxima on the field, collisions are inherently avoided as contours never go through peaks or intersect. Additionally, counters are smooth and energy-efficient. Our framework enables training with large swarm sizes as the agent interaction is minimized and the need for complex credit assignment schemes or observation sharing mechanisms in state-of-the-art MARL approaches are eliminated. Moreover, UA Vs obtain the ability to adapt to complex environments where contours may be nonviable or non-existent through intensive training. Extensive experiments are conducted to evaluate the performances of our framework against state-of-the-art MARL algorithms.
AI Mother Tongue: Self-Emergent Communication in MARL via Endogenous Symbol Systems
In Decentralized Multi-Agent Reinforcement Learning (MARL), the development of Emergent Communication has long been constrained by the ``Joint Exploration Dilemma'', leading agents to fall into a ``Communication Vacuum Equilibrium'' . Traditional methods address this by introducing inductive biases to facilitate communication emergence . This study fundamentally questions whether such artificial inductive biases are, in fact, over-engineering. Through experiments with the ``AI Mother Tongue'' (AIM) framework, based on a Vector Quantized Variational Autoencoder (VQ-VAE), we demonstrate that when agents possess an endogenous symbol system, their neural representations naturally exhibit spontaneous semantic compression and Nash equilibrium-driven semantic convergence, achieving effective symbolic communication without external inductive biases. This aligns with recent neuroscience findings suggesting that the human brain does not directly use human language for internal thought , and resonates with research on ``soft thinking'' capabilities in Large Language Models (LLMs) . Compared to traditional explicit communication methods, AIM demonstrates stronger generality and efficiency. The interpretable analysis toolkit developed in this study confirms that symbol usage exhibits a significant power-law distribution, leading to three major theoretical insights: the ``Neural Communication Hypothesis'', the ``Tool-First Principle'', and the ``Semantic Interpretability Paradigm''. Future research will explore the integration of Hierarchical Quantized Variational Autoencoders (HQ-VAE) to enhance AIM's complex expressive capabilities and investigate the potential for ``Reinforcement Learning (RL) Low-Level Pre-training''. This discovery offers new avenues for bridging symbolism and connectionism.
Adaptability in Multi-Agent Reinforcement Learning: A Framework and Unified Review
Hu, Siyi, Hady, Mohamad A, Qiao, Jianglin, Cao, Jimmy, Pratama, Mahardhika, Kowalczyk, Ryszard
Multi-Agent Reinforcement Learning (MARL) has shown clear effectiveness in coordinating multiple agents across simulated benchmarks and constrained scenarios. However, its deployment in real-world multi-agent systems (MAS) remains limited, primarily due to the complex and dynamic nature of such environments. These challenges arise from multiple interacting sources of variability, including fluctuating agent populations, evolving task goals, and inconsistent execution conditions. Together, these factors demand that MARL algorithms remain effective under continuously changing system configurations and operational demands. To better capture and assess this capacity for adjustment, we introduce the concept of \textit{adaptability} as a unified and practically grounded lens through which to evaluate the reliability of MARL algorithms under shifting conditions, broadly referring to any changes in the environment dynamics that may occur during learning or execution. Centred on the notion of adaptability, we propose a structured framework comprising three key dimensions: learning adaptability, policy adaptability, and scenario-driven adaptability. By adopting this adaptability perspective, we aim to support more principled assessments of MARL performance beyond narrowly defined benchmarks. Ultimately, this survey contributes to the development of algorithms that are better suited for deployment in dynamic, real-world multi-agent systems.
Large Population Models
Many of society's most pressing challenges, from pandemic response to supply chain disruptions to climate adaptation, emerge from the collective behavior of millions of autonomous agents making decisions over time. Large Population Models (LPMs) offer an approach to understand these complex systems by simulating entire populations with realistic behaviors and interactions at unprecedented scale. LPMs extend traditional modeling approaches through three key innovations: computational methods that efficiently simulate millions of agents simultaneously, mathematical frameworks that learn from diverse real-world data streams, and privacy-preserving communication protocols that bridge virtual and physical environments. This allows researchers to observe how agent behavior aggregates into system-level outcomes and test interventions before real-world implementation. While current AI advances primarily focus on creating "digital humans" with sophisticated individual capabilities, LPMs develop "digital societies" where the richness of interactions reveals emergent phenomena. By bridging individual agent behavior and population-scale dynamics, LPMs offer a complementary path in AI research illuminating collective intelligence and providing testing grounds for policies and social innovations before real-world deployment. We discuss the technical foundations and some open problems here. LPMs are implemented by the AgentTorch framework (github.com/AgentTorch/AgentTorch)