AITopics | Agent Societies

Collaborating Authors

Agent Societies

News Overviews Instructional Materials AI-Alerts Classics

Oryx: a Scalable Sequence Model for Many-Agent Coordination in Offline MARL

Formanek, Claude, Mahjoub, Omayma, Nessir, Louay Ben, Abramowitz, Sasha, de Kock, Ruan, Khlifi, Wiem, Rajaonarivonivelomanantsoa, Daniel, Toit, Simon Du, Fokam, Arnol, Singh, Siddarth, Sob, Ulrich Mbou, Chalumeau, Felix, Pretorius, Arnu

arXiv.org Artificial IntelligenceOct-31-2025

A key challenge in offline multi-agent reinforcement learning (MARL) is achieving effective many-agent multi-step coordination in complex environments. In this work, we propose Oryx, a novel algorithm for offline cooperative MARL to directly address this challenge. Oryx adapts the recently proposed retention-based architecture Sable and combines it with a sequential form of implicit constraint Q-learning (ICQ), to develop a novel offline autoregressive policy update scheme. This allows Oryx to solve complex coordination challenges while maintaining temporal coherence over long trajectories. We evaluate Oryx across a diverse set of benchmarks from prior works -- SMAC, RWARE, and Multi-Agent MuJoCo -- covering tasks of both discrete and continuous control, varying in scale and difficulty. Oryx achieves state-of-the-art performance on more than 80% of the 65 tested datasets, outperforming prior offline MARL methods and demonstrating robust generalisation across domains with many agents and long horizons. Finally, we introduce new datasets to push the limits of many-agent coordination in offline MARL, and demonstrate Oryx's superior ability to scale effectively in such settings.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2505.22151

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.46)

Industry: Transportation (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.47)

Add feedback

AutoLibra: Agent Metric Induction from Open-Ended Human Feedback

Zhu, Hao, Cuvin, Phil, Yu, Xinkai, Yan, Charlotte Ka Yee, Zhang, Jason, Yang, Diyi

arXiv.org Artificial IntelligenceOct-31-2025

Agents are predominantly evaluated and optimized via task success metrics, which are coarse, rely on manual design from experts, and fail to reward intermediate emergent behaviors. We propose **AutoLibra**, a framework for agent evaluation, that transforms open-ended human feedback *e.g.* "If you find that the button is disabled, don't click it again", or "This agent has too much autonomy to decide what to do on its own" into metrics for evaluating fine-grained behaviors in agent trajectories. AutoLibra accomplishes this by grounding feedback to an agent's behavior, clustering similar positive and negative behaviors, and creating concrete metrics with clear definitions and concrete examples, which can be used for prompting LLM-as-a-Judge as evaluators. We further propose two meta metrics to evaluate the alignment of a set of (induced) metrics with open feedback: "coverage" and "redundancy". Through optimizing these meta-metrics, we experimentally demonstrate AutoLibra's ability to induce more concrete agent evaluation metrics than the ones proposed in previous agent evaluation benchmarks and discover new metrics to analyze agents. We also present two applications of AutoLibra in agent improvement: First, we show that AutoLibra serve human prompt engineers for diagonalize agent failures and improve prompts iterative. Moreover, we find that AutoLibra can induce metrics for automatic optimization for agents, which makes agents improve through self-regulation. Our results suggest that AutoLibra is a powerful task-agnostic tool for evaluating and improving language agents.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2505.0282

Country:

Asia (0.46)
North America > United States (0.45)

Genre: Research Report > New Finding (1.00)

Industry:

Education (0.92)
Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.34)

Add feedback

A General Incentives-Based Framework for Fairness in Multi-agent Resource Allocation

Kumar, Ashwin, Yeoh, William

arXiv.org Artificial IntelligenceOct-31-2025

We introduce the General Incentives-based Framework for Fairness (GIFF), a novel approach for fair multi-agent resource allocation that infers fair decision-making from standard value functions. In resource-constrained settings, agents optimizing for efficiency often create inequitable outcomes. Our approach leverages the action-value (Q-)function to balance efficiency and fairness without requiring additional training. Specifically, our method computes a local fairness gain for each action and introduces a counterfactual advantage correction term to discourage over-allocation to already well-off agents. This approach is formalized within a centralized control setting, where an arbitrator uses the GIFF-modified Q-values to solve an allocation problem. Empirical evaluations across diverse domains, including dynamic ridesharing, homelessness prevention, and a complex job allocation task-demonstrate that our framework consistently outperforms strong baselines and can discover far-sighted, equitable policies. The framework's effectiveness is supported by a theoretical foundation; we prove its fairness surrogate is a principled lower bound on the true fairness improvement and that its trade-off parameter offers monotonic tuning. Our findings establish GIFF as a robust and principled framework for leveraging standard reinforcement learning components to achieve more equitable outcomes in complex multi-agent systems.

agent, artificial intelligence, fairness, (17 more...)

arXiv.org Artificial Intelligence

2510.2674

Country: North America > United States (0.67)

Genre:

Research Report > New Finding (0.66)
Research Report > Promising Solution (0.48)

Industry: Transportation > Ground > Road (0.67)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Add feedback

Multi-party Agent Relation Sampling for Multi-party Ad Hoc Teamwork

Zhang, Beiwen, Liang, Yongheng, Wu, Hejun

arXiv.org Artificial IntelligenceOct-30-2025

Multi-agent reinforcement learning (MARl) has achieved strong results in cooperative tasks but typically assumes fixed, fully controlled teams. Ad hoc teamwork (AHT) relaxes this by allowing collaboration with unknown partners, yet existing variants still presume shared conventions. We introduce Multil-party Ad Hoc Teamwork (MAHT), where controlled agents must coordinate with multiple mutually unfamiliar groups of uncontrolled teammates. To address this, we propose MARs, which builds a sparse skeleton graph and applies relational modeling to capture cross-group dvnamics. Experiments on MPE and starCralt ll show that MARs outperforms MARL and AHT baselines while converging faster.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2510.2534

Country:

North America > United States (0.94)
Europe > Middle East > Cyprus (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Agentic Moderation: Multi-Agent Design for Safer Vision-Language Models

Ren, Juan, Dras, Mark, Naseem, Usman

arXiv.org Artificial IntelligenceOct-30-2025

Abstract--Agentic methods have emerged as a powerful and autonomous paradigm that enhances reasoning, collaboration, and adaptive control, enabling systems to coordinate and independently solve complex tasks. We extend this paradigm to safety alignment by introducing Agentic Moderation, a model-agnostic framework that leverages specialized agents to defend multimodal systems against jailbreak attacks. Unlike prior approaches that apply as a static layer over inputs or outputs and provide only binary classifications(safe or unsafe), our method integrates dynamic, cooperative agents,including Shield, Responder, Evaluator, and Reflector,to achieve context-aware and interpretable moderation. Extensive experiments across five datasets and four representative large vision-language models (L VLMs) demonstrate that our approach reduces the Attack Success Rate (ASR) by 7-19%, maintains a stable Non-Following Rate (NF), and improves the Refusal Rate (RR) by 4-20%, achieving robust, interpretable, and well-balanced safety performance. By harnessing the flexibility and reasoning capacity of agentic architectures, Agentic Moderation provides modular, scalable, and fine-grained safety enforcement, highlighting the broader potential of agentic systems as a foundation for automated safety governance. Large vision-language models (L VLMs) integrate visual and textual modalities, enabling richer multimodal reasoning and expanding their application scope. Malicious users can exploit cross-modal interactions and the continuous nature of visual embedding spaces, which makes adversarial defenses especially challenging. Cross-modality adversarial attacks exploit visual vulnerabilities and modality shifts in semantic meaning. Examples include pixel-level perturbations that embed harmful intent within images [1]-[3], malicious content rendered via typography or flowcharts [4], harmful behaviors that emerge only from the combination of benign-looking text and visual inputs, implicit cross-modal interactions that obscure adversarial intent [5], and hybrid or ensemble strategies that combine these mechanisms [6].

artificial intelligence, machine learning, natural language, (13 more...)

arXiv.org Artificial Intelligence

2510.25179

Genre:

Research Report (0.68)
Workflow (0.67)

Industry: Information Technology > Security & Privacy (0.90)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

URB -- Urban Routing Benchmark for RL-equipped Connected Autonomous Vehicles

Akman, Ahmet Onur, Psarou, Anastasia, Hoffmann, Michał, Gorczyca, Łukasz, Kowalski, Łukasz, Gora, Paweł, Jamróz, Grzegorz, Kucharski, Rafał

arXiv.org Artificial IntelligenceOct-29-2025

Connected Autonomous Vehicles (CAVs) promise to reduce congestion in future urban networks, potentially by optimizing their routing decisions. Unlike for human drivers, these decisions can be made with collective, data-driven policies, developed using machine learning algorithms. Reinforcement learning (RL) can facilitate the development of such collective routing strategies, yet standardized and realistic benchmarks are missing. To that end, we present URB: Urban Routing Benchmark for RL-equipped Connected Autonomous Vehicles. URB is a comprehensive benchmarking environment that unifies evaluation across 29 real-world traffic networks paired with realistic demand patterns. URB comes with a catalog of predefined tasks, multi-agent RL (MARL) algorithm implementations, three baseline methods, domain-specific performance metrics, and a modular configuration scheme. Our results show that, despite the lengthy and costly training, state-of-the-art MARL algorithms rarely outperformed humans. The experimental results reported in this paper initiate the first leaderboard for MARL in large-scale urban routing optimization. They reveal that current approaches struggle to scale, emphasizing the urgent need for advancements in this domain.

machine learning, reinforcement learning, scenario 1, (18 more...)

arXiv.org Artificial Intelligence

2505.17734

Country:

Europe (1.00)
North America > United States (0.67)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Transportation > Ground > Road (0.68)
Consumer Products & Services > Travel (0.47)
Transportation > Infrastructure & Services (0.46)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.90)

Add feedback

Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents

Song, Yueqi, Ramaneti, Ketan, Sheikh, Zaid, Chen, Ziru, Gou, Boyu, Xie, Tianbao, Xu, Yiheng, Zhang, Danyang, Gandhi, Apurva, Yang, Fan, Liu, Joseph, Ou, Tianyue, Yuan, Zhihao, Xu, Frank, Zhou, Shuyan, Wang, Xingyao, Yue, Xiang, Yu, Tao, Sun, Huan, Su, Yu, Neubig, Graham

arXiv.org Artificial IntelligenceOct-29-2025

Public research results on large-scale supervised finetuning of AI agents remain relatively rare, since the collection of agent training data presents unique challenges. In this work, we argue that the bottleneck is not a lack of underlying data sources, but that a large variety of data is fragmented across heterogeneous formats, tools, and interfaces. To this end, we introduce the agent data protocol (ADP), a light-weight representation language that serves as an "interlingua" between agent datasets in diverse formats and unified agent training pipelines downstream. The design of ADP is expressive enough to capture a large variety of tasks, including API/tool use, browsing, coding, software engineering, and general agen-tic workflows, while remaining simple to parse and train on without engineering at a per-dataset level. In experiments, we unified a broad collection of 13 existing agent training datasets into ADP format, and converted the standardized ADP data into training-ready formats for multiple agent frameworks. We performed SFT on these data, and demonstrated an average performance gain of 20% over corresponding base models, and delivers state-of-the-art or near-SOT A performance on standard coding, browsing, tool use, and research benchmarks, without domain-specific tuning. All code and data are released publicly, in the hope that ADP could help lower the barrier to standardized, scalable, and reproducible agent training. In contrast, post-training presents a much harder challenge: high-quality task-specific data must be carefully curated.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2510.24702

Country: North America > United States (1.00)

Genre: Research Report (0.40)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.34)

Add feedback

Decentralized Multi-Agent Goal Assignment for Path Planning using Large Language Models

Ismayilov, Murad, Meriaux, Edwin, Wen, Shuo, Dudek, Gregory

arXiv.org Artificial IntelligenceOct-29-2025

Abstract--Coordinating multiple autonomous agents in shared environments under decentralized conditions is a long-standing challenge in robotics and artificial intelligence. This work addresses the problem of decentralized goal assignment for multi-agent path planning, where agents independently generate ranked preferences over goals based on structured representations of the environment, including grid visualizations and scenario data. After this reasoning phase, agents exchange their goal rankings, and assignments are determined by a fixed, deterministic conflict-resolution rule (e.g., agent index ordering), without negotiation or iterative coordination. We systematically compare greedy heuristics, optimal assignment, and large language model (LLM)- based agents in fully observable grid-world settings. Our results show that LLM-based agents, when provided with well-designed prompts and relevant quantitative information, can achieve near-optimal makespans and consistently outperform traditional heuristics. These findings underscore the potential of language models for decentralized goal assignment in multi-agent path planning and highlight the importance of information structure in such systems.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2510.23824

Country: North America > Canada > Quebec > Montreal (0.15)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.86)

Add feedback

MOSAIC: Modeling Social AI for Content Dissemination and Regulation in Multi-Agent Simulations

Liu, Genglin, Le, Vivian, Rahman, Salman, Kreiss, Elisa, Ghassemi, Marzyeh, Gabriel, Saadia

arXiv.org Artificial IntelligenceOct-28-2025

We present a novel, open-source social network simulation framework, MOSAIC, where generative language agents predict user behaviors such as liking, sharing, and flagging content. This simulation combines LLM agents with a directed social graph to analyze emergent deception behaviors and gain a better understanding of how users determine the veracity of online social content. By constructing user representations from diverse fine-grained personas, our system enables multi-agent simulations that model content dissemination and engagement dynamics at scale. Within this framework, we evaluate three different content moderation strategies with simulated misinformation dissemination, and we find that they not only mitigate the spread of non-factual content but also increase user engagement. In addition, we analyze the trajectories of popular content in our simulations, and explore whether simulation agents' articulated reasoning for their social interactions truly aligns with their collective engagement patterns. We open-source our simulation software to encourage further research within AI and social sciences.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2504.0783

Country: North America > United States > California (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Media > News (1.00)
Government (1.00)
Health & Medicine (0.93)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

Add feedback

Collective decision-making under changing social environments among agents adapted to sparse connectivity

Mann, Richard P.

arXiv.org Artificial IntelligenceOct-28-2025

Humans and other animals often follow the decisions made by others because these are indicative of the quality of possible choices, resulting in `social response rules': observed relationships between the probability that an agent will make a specific choice and the decisions other individuals have made. The form of social responses can be understood by considering the behaviour of rational agents that seek to maximise their expected utility using both social and private information. Previous derivations of social responses assume that agents observe all others within a group, but real interaction networks are often characterised by sparse connectivity. Here I analyse the observable behaviour of rational agents that attend to the decisions made by a subset of others in the group. This reveals an adaptive strategy in sparsely-connected networks based on highly-simplified social information: the difference in the observed number of agents choosing each option. Where agents employ this strategy, collective outcomes and decision-making efficacy are controlled by the social connectivity at the time of the decision, rather than that to which the agents are accustomed, providing an important caveat for sociality observed in the laboratory and suggesting a basis for the social dynamics of highly-connected online communities.

artificial intelligence, machine learning, social media, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1177/26339137221121347

2110.13543

Country:

North America > United States (0.29)
Europe > United Kingdom > England (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.48)

Add feedback