Agent Societies
Collaborative Multi-Agent Reinforcement Learning for Automated Feature Transformation with Graph-Driven Path Optimization
Huang, Xiaohan, Wang, Dongjie, Ning, Zhiyuan, Qiao, Ziyue, Long, Qingqing, Zhu, Haowei, Du, Yi, Wu, Min, Zhou, Yuanchun, Xiao, Meng
--Feature transformation methods aim to find an optimal mathematical feature-feature crossing process that generates high-value features and improves the performance of downstream machine learning tasks. Existing frameworks, though designed to mitigate manual costs, often treat feature transformations as isolated operations, ignoring dynamic dependencies between transformation steps. T o address the limitations, we propose TCTO, a collaborative multi-agent reinforcement learning framework that automates feature engineering through graph-driven path optimization. The framework's core innovation lies in an evolving interaction graph that models features as nodes and transformations as edges. Through graph pruning and backtracking, it dynamically eliminates low-impact edges, reduces redundant operation, and enhances exploration stability. This graph also provides full traceability to empower TCTO to reuse high-utility subgraphs from historical transformations. T o demonstrate the efficacy and adaptability of our approach, we conduct comprehensive experiments and case studies, which show superior performance across a range of datasets. LASSICAL machine learning (ML) heavily relies on the structure of the model and the quality of the involving features [1]-[4]. This dependency makes designing effective features a crucial step before the learning process. Traditionally, designing effective features required extensive manual intervention, where scientists applied mathematical transformations to raw data to create meaningful ones [5], [6]. This process, illustrated in Figure 1, is known as feature transformation [7]-[9]. Xiaohan Huang, Zhiyuan Ning and Qingqing Long are with the Computer Network Information Center, Chinese Academy of Sciences, and the University of the Chinese Academy of Sciences. Yi Du, Y uanchun Zhou, and Meng Xiao are with the Computer Network Information Center, Chinese Academy of Sciences. Dongjie Wang is with the Department of Electrical Engineering and Computer Science at the University of Kansas. Ziyue Qiao is with the School of Computing and Information Technology, Great Bay University, Dongguan, China.
Cognitive Silicon: An Architectural Blueprint for Post-Industrial Computing Systems
Haryanto, Christoforus Yoga, Lomempow, Emily
Autonomous AI systems reveal foundational limitations in deterministic, human-authored computing architectures. This paper presents Cognitive Silicon: a hypothetical full-stack architectural framework projected toward 2035, exploring a possible trajectory for cognitive computing system design. The proposed architecture would integrate symbolic scaffolding, governed memory, runtime moral coherence, and alignment-aware execution across silicon-to-semantics layers. Our design grammar has emerged from dialectical co-design with LLMs under asymmetric epistemic conditions--creating structured friction to expose blind spots and trade-offs. The envisioned framework would establish mortality as a natural consequence of physical constraints, non-copyable tacit knowledge, and non-cloneable identity keys as cognitive-embodiment primitives. Core tensions (trust/agency, scaffolding/emergence, execution/governance) would function as central architectural pressures rather than edge cases. The architecture theoretically converges with the Free Energy Principle, potentially offering a formal account of how cognitive systems could maintain identity through prediction error minimization across physical and computational boundaries. The resulting framework aims to deliver a morally tractable cognitive infrastructure that could maintain human-alignment through irreversible hardware constraints and identity-bound epistemic mechanisms resistant to replication or subversion.
SOTOPIA-S4: a user-friendly system for flexible, customizable, and large-scale social simulation
Zhou, Xuhui, Su, Zhe, Feng, Sophie, Zhou, Jiaxu, Huang, Jen-tse, Kao, Hsien-Te, Lynch, Spencer, Volkova, Svitlana, Wu, Tongshuang Sherry, Woolley, Anita, Zhu, Hao, Sap, Maarten
Social simulation through large language model (LLM) agents is a promising approach to explore and validate hypotheses related to social science questions and LLM agents behavior. We present SOTOPIA-S4, a fast, flexible, and scalable social simulation system that addresses the technical barriers of current frameworks while enabling practitioners to generate multi-turn and multi-party LLM-based interactions with customizable evaluation metrics for hypothesis testing. SOTOPIA-S4 comes as a pip package that contains a simulation engine, an API server with flexible RESTful APIs for simulation management, and a web interface that enables both technical and non-technical users to design, run, and analyze simulations without programming. We demonstrate the usefulness of SOTOPIA-S4 with two use cases involving dyadic hiring negotiation and multi-party planning scenarios.
Solving Multi-Agent Safe Optimal Control with Distributed Epigraph Form MARL
Zhang, Songyuan, So, Oswin, Black, Mitchell, Serlin, Zachary, Fan, Chuchu
Tasks for multi-robot systems often require the robots to collaborate and complete a team goal while maintaining safety. This problem is usually formalized as a constrained Markov decision process (CMDP), which targets minimizing a global cost and bringing the mean of constraint violation below a user-defined threshold. Inspired by real-world robotic applications, we define safety as zero constraint violation. While many safe multi-agent reinforcement learning (MARL) algorithms have been proposed to solve CMDPs, these algorithms suffer from unstable training in this setting. To tackle this, we use the epigraph form for constrained optimization to improve training stability and prove that the centralized epigraph form problem can be solved in a distributed fashion by each agent. This results in a novel centralized training distributed execution MARL algorithm named Def-MARL. Simulation experiments on 8 different tasks across 2 different simulators show that Def-MARL achieves the best overall performance, satisfies safety constraints, and maintains stable training. Real-world hardware experiments on Crazyflie quadcopters demonstrate the ability of Def-MARL to safely coordinate agents to complete complex collaborative tasks compared to other methods.
A biologically Inspired Trust Model for Open Multi-Agent Systems that is Resilient to Rapid Performance Fluctuations
Lygizou, Zoi, Kalles, Dimitris
Trust management provides an alternative solution for securing open, dynamic, and distributed multi-agent systems, where conventional cryptographic methods prove to be impractical. However, existing trust models face challenges related to agent mobility, changing behaviors, and the cold start problem. To address these issues we introduced a biologically inspired trust model in which trustees assess their own capabilities and store trust data locally. This design improves mobility support, reduces communication overhead, resists disinformation, and preserves privacy. Despite these advantages, prior evaluations revealed limitations of our model in adapting to provider population changes and continuous performance fluctuations. This study proposes a novel algorithm, incorporating a self-classification mechanism for providers to detect performance drops potentially harmful for the service consumers. Simulation results demonstrate that the new algorithm outperforms its original version and FIRE, a well-known trust and reputation model, particularly in handling dynamic trustee behavior. While FIRE remains competitive under extreme environmental changes, the proposed algorithm demonstrates greater adaptability across various conditions. In contrast to existing trust modeling research, this study conducts a comprehensive evaluation of our model using widely recognized trust model criteria, assessing its resilience against common trust-related attacks while identifying strengths, weaknesses, and potential countermeasures. Finally, several key directions for future research are proposed.
LOKA Protocol: A Decentralized Framework for Trustworthy and Ethical AI Agent Ecosystems
Ranjan, Rajesh, Gupta, Shailja, Singh, Surya Narayan
The rise of autonomous AI agents, capable of perceiving, reasoning, and acting independently, signals a profound shift in how digital ecosystems operate, govern, and evolve. As these agents proliferate beyond centralized infrastructures, they expose foundational gaps in identity, accountability, and ethical alignment. Three critical questions emerge: Identity: Who or what is the agent? Accountability: Can its actions be verified, audited, and trusted? Ethical Consensus: Can autonomous systems reliably align with human values and prevent harmful emergent behaviors? We present the novel LOKA Protocol (Layered Orchestration for Knowledgeful Agents), a unified, systems-level architecture for building ethically governed, interoperable AI agent ecosystems. LOKA introduces a proposed Universal Agent Identity Layer (UAIL) for decentralized, verifiable identity; intent-centric communication protocols for semantic coordination across diverse agents; and a Decentralized Ethical Consensus Protocol (DECP) that could enable agents to make context-aware decisions grounded in shared ethical baselines. Anchored in emerging standards such as Decentralized Identifiers (DIDs), Verifiable Credentials (VCs), and post-quantum cryptography, LOKA proposes a scalable, future-resilient blueprint for multi-agent AI governance. By embedding identity, trust, and ethics into the protocol layer itself, LOKA proposes the foundation for a new era of responsible, transparent, and autonomous AI ecosystems operating across digital and physical domains.
Why Ask One When You Can Ask $k$? Two-Stage Learning-to-Defer to the Top-$k$ Experts
Montreuil, Yannis, Carlier, Axel, Ng, Lai Xing, Ooi, Wei Tsang
Learning-to-Defer (L2D) enables decision-making systems to improve reliability by selectively deferring uncertain predictions to more competent agents. However, most existing approaches focus exclusively on single-agent deferral, which is often inadequate in high-stakes scenarios that require collective expertise. We propose Top-$k$ Learning-to-Defer, a generalization of the classical two-stage L2D framework that allocates each query to the $k$ most confident agents instead of a single one. To further enhance flexibility and cost-efficiency, we introduce Top-$k(x)$ Learning-to-Defer, an adaptive extension that learns the optimal number of agents to consult for each query, based on input complexity, agent competency distributions, and consultation costs. For both settings, we derive a novel surrogate loss and prove that it is Bayes-consistent and $(\mathcal{R}, \mathcal{G})$-consistent, ensuring convergence to the Bayes-optimal allocation. Notably, we show that the well-established model cascades paradigm arises as a restricted instance of our Top-$k$ and Top-$k(x)$ formulations. Extensive experiments across diverse benchmarks demonstrate the effectiveness of our framework on both classification and regression tasks.
BookWorld: From Novels to Interactive Agent Societies for Creative Story Generation
Ran, Yiting, Wang, Xintao, Qiu, Tian, Liang, Jiaqing, Xiao, Yanghua, Yang, Deqing
Recent advances in large language models (LLMs) have enabled social simulation through multi-agent systems. Prior efforts focus on agent societies created from scratch, assigning agents with newly defined personas. However, simulating established fictional worlds and characters remain largely underexplored, despite its significant practical value. In this paper, we introduce BookWorld, a comprehensive system for constructing and simulating book-based multi-agent societies. BookWorld's design covers comprehensive real-world intricacies, including diverse and dynamic characters, fictional worldviews, geographical constraints and changes, e.t.c. BookWorld enables diverse applications including story generation, interactive games and social simulation, offering novel ways to extend and explore beloved fictional works. Through extensive experiments, we demonstrate that BookWorld generates creative, high-quality stories while maintaining fidelity to the source books, surpassing previous methods with a win rate of 75.36%. The code of this paper can be found at the project page: https://bookworld2025.github.io/.
Cross-environment Cooperation Enables Zero-shot Multi-agent Coordination
Jha, Kunal, Carvalho, Wilka, Liang, Yancheng, Du, Simon S., Kleiman-Weiner, Max, Jaques, Natasha
Zero-shot coordination (ZSC), the ability to adapt to a new partner in a cooperative task, is a critical component of human-compatible AI. While prior work has focused on training agents to cooperate on a single task, these specialized models do not generalize to new tasks, even if they are highly similar. Here, we study how reinforcement learning on a distribution of environments with a single partner enables learning general cooperative skills that support ZSC with many new partners on many new problems. We introduce two Jax-based, procedural generators that create billions of solvable coordination challenges. We develop a new paradigm called Cross-Environment Cooperation (CEC), and show that it outperforms competitive baselines quantitatively and qualitatively when collaborating with real people. Our findings suggest that learning to collaborate across many unique scenarios encourages agents to develop general norms, which prove effective for collaboration with different partners. Together, our results suggest a new route toward designing generalist cooperative agents capable of interacting with humans without requiring human data.
Self-Resource Allocation in Multi-Agent LLM Systems
Amayuelas, Alfonso, Yang, Jingbo, Agashe, Saaket, Nagarajan, Ashwin, Antoniades, Antonis, Wang, Xin Eric, Wang, William
With the development of LLMs as agents, there is a growing interest in connecting multiple agents into multi-agent systems to solve tasks concurrently, focusing on their role in task assignment and coordination. This paper explores how LLMs can effectively allocate computational tasks among multiple agents, considering factors such as cost, efficiency, and performance. In this work, we address key questions, including the effectiveness of LLMs as orchestrators and planners, comparing their effectiveness in task assignment and coordination. Our experiments demonstrate that LLMs can achieve high validity and accuracy in resource allocation tasks. We find that the planner method outperforms the orchestrator method in handling concurrent actions, resulting in improved efficiency and better utilization of agents. Additionally, we show that providing explicit information about worker capabilities enhances the allocation strategies of planners, particularly when dealing with suboptimal workers.