Agent Societies
Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance
Lu, Yaxi, Yang, Shenzhi, Qian, Cheng, Chen, Guirong, Luo, Qinyu, Wu, Yesai, Wang, Huadong, Cong, Xin, Zhang, Zhong, Lin, Yankai, Liu, Weiwen, Wang, Yasheng, Liu, Zhiyuan, Liu, Fangming, Sun, Maosong
Agents powered by large language models have shown remarkable abilities in solving complex tasks. However, most agent systems remain reactive, limiting their effectiveness in scenarios requiring foresight and autonomous decision-making. In this paper, we tackle the challenge of developing proactive agents capable of anticipating and initiating tasks without explicit human instructions. We propose a novel data-driven approach for this problem. Firstly, we collect real-world human activities to generate proactive task predictions. These predictions are then labeled by human annotators as either accepted or rejected. The labeled data is used to train a reward model that simulates human judgment and serves as an automatic evaluator of the proactiveness of LLM agents. Building on this, we develop a comprehensive data generation pipeline to create a diverse dataset, ProactiveBench, containing 6,790 events. Finally, we demonstrate that fine-tuning models with the proposed ProactiveBench can significantly elicit the proactiveness of LLM agents. Experimental results show that our fine-tuned model achieves an F1-Score of 66.47% in proactively offering assistance, outperforming all open-source and close-source models. These results highlight the potential of our method in creating more proactive and effective agent systems, paving the way for future advancements in human-agent collaboration.
InvestESG: A multi-agent reinforcement learning benchmark for studying climate investment as a social dilemma
Hou, Xiaoxuan, Yuan, Jiayi, Leibo, Joel Z., Jaques, Natasha
InvestESG is a novel multi-agent reinforcement learning (MARL) benchmark designed to study the impact of Environmental, Social, and Governance (ESG) disclosure mandates on corporate climate investments. Supported by both PyTorch and JAX implementation, the benchmark models an intertemporal social dilemma where companies balance short-term profit losses from climate mitigation efforts and long-term benefits from reducing climate risk, while ESG-conscious investors attempt to influence corporate behavior through their investment decisions, in a scalable and hardware-accelerated manner. Companies allocate capital across mitigation, greenwashing, and resilience, with varying strategies influencing climate outcomes and investor preferences. Our experiments show that without ESG-conscious investors with sufficient capital, corporate mitigation efforts remain limited under the disclosure mandate. However, when a critical mass of investors prioritizes ESG, corporate cooperation increases, which in turn reduces climate risks and enhances long-term financial stability. Additionally, providing more information about global climate risks encourages companies to invest more in mitigation, even without investor involvement. Our findings align with empirical research using real-world data, highlighting MARL's potential to inform policy by providing insights into large-scale socio-economic challenges through efficient testing of alternative policy and market designs.
Mitigating Bias in Queer Representation within Large Language Models: A Collaborative Agent Approach
Huang, Tianyi, Somasundaram, Arya
Large Language Models (LLMs) often perpetuate biases in pronoun usage, leading to misrepresentation or exclusion of queer individuals. This paper addresses the specific problem of biased pronoun usage in LLM outputs, particularly the inappropriate use of traditionally gendered pronouns ("he," "she") when inclusive language is needed to accurately represent all identities. We introduce a collaborative agent pipeline designed to mitigate these biases by analyzing and optimizing pronoun usage for inclusivity. Our multi-agent framework includes specialized agents for both bias detection and correction. Experimental evaluations using the Tango dataset-a benchmark focused on gender pronoun usage-demonstrate that our approach significantly improves inclusive pronoun classification, achieving a 32.6 percentage point increase over GPT-4o in correctly disagreeing with inappropriate traditionally gendered pronouns $(\chi^2 = 38.57, p < 0.0001)$. These results accentuate the potential of agent-driven frameworks in enhancing fairness and inclusivity in AI-generated content, demonstrating their efficacy in reducing biases and promoting socially responsible AI.
Mean-Field Sampling for Cooperative Multi-Agent Reinforcement Learning
Anand, Emile, Karmarkar, Ishani, Qu, Guannan
Reinforcement Learning (RL) has become a popular learning framework to solve sequential decision making problems in unknown environments, and has achieved tremendous success in a wide array of domains such as playing the game of Go (Silver et al., 2016), robotic control (Kober et al., 2013), and autonomous driving (Kiran et al., 2022; Lin et al., 2023). A critical feature of most real-world systems is their uncertain nature, and consequently RL has emerged as a powerful tool for learning optimal policies for multi-agent systems to operate in unknown environments (Kim & Giannakis, 2017; Zhang et al., 2021; Lin et al., 2024; Anand & Qu, 2024). While the early literature on RL predominantly focused on the single-agent setting, multi-agent reinforcement learning (MARL) has also recently achieved impressive successes in a broad range of areas, such as coordination of robotic swarms (Preiss et al., 2017), self-driving vehicles (DeWeese & Qu, 2024), real-time bidding (Jin et al., 2018), ride-sharing (Li et al., 2019), and stochastic games (Jin et al., 2020). Despite growing interest in multi-agent RL (MARL), extending RL to multi-agent settings poses significant computational challenges due to the curse of dimensionality (Sayin et al., 2021). Even if the individual agents' state or action spaces are small, the global state space or action space can take values from a set with size that is exponentially large as a function of the number of agents.
Towards Fault Tolerance in Multi-Agent Reinforcement Learning
Shi, Yuchen, Pei, Huaxin, Feng, Liang, Zhang, Yi, Yao, Danya
Agent faults pose a significant threat to the performance of multi-agent reinforcement learning (MARL) algorithms, introducing two key challenges. First, agents often struggle to extract critical information from the chaotic state space created by unexpected faults. Second, transitions recorded before and after faults in the replay buffer affect training unevenly, leading to a sample imbalance problem. To overcome these challenges, this paper enhances the fault tolerance of MARL by combining optimized model architecture with a tailored training data sampling strategy. Specifically, an attention mechanism is incorporated into the actor and critic networks to automatically detect faults and dynamically regulate the attention given to faulty agents. Additionally, a prioritization mechanism is introduced to selectively sample transitions critical to current training needs. To further support research in this area, we design and open-source a highly decoupled code platform for fault-tolerant MARL, aimed at improving the efficiency of studying related problems. Experimental results demonstrate the effectiveness of our method in handling various types of faults, faults occurring in any agent, and faults arising at random times.
RMIO: A Model-Based MARL Framework for Scenarios with Observation Loss in Some Agents
Zifeng, Shi, Meiqin, Liu, Senlin, Zhang, Ronghao, Zheng, Shanling, Dong
In recent years, model-based reinforcement learning (MBRL) has emerged as a solution to address sample complexity in multi-agent reinforcement learning (MARL) by modeling agent-environment dynamics to improve sample efficiency. However, most MBRL methods assume complete and continuous observations from each agent during the inference stage, which can be overly idealistic in practical applications. A novel model-based MARL approach called RMIO is introduced to address this limitation, specifically designed for scenarios where observation is lost in some agent. RMIO leverages the world model to reconstruct missing observations, and further reduces reconstruction errors through inter-agent information integration to ensure stable multi-agent decision-making. Secondly, unlike CTCE methods such as MAMBA, RMIO adopts the CTDE paradigm in standard environment, and enabling limited communication only when agents lack observation data, thereby reducing reliance on communication. Additionally, RMIO improves asymptotic performance through strategies such as reward smoothing, a dual-layer experience replay buffer, and an RNN-augmented policy model, surpassing previous work. Our experiments conducted in both the SMAC and MaMuJoCo environments demonstrate that RMIO outperforms current state-of-the-art approaches in terms of asymptotic convergence performance and policy robustness, both in standard mission settings and in scenarios involving observation loss.
Integrating Transit Signal Priority into Multi-Agent Reinforcement Learning based Traffic Signal Control
Kwesiga, Dickness Kakitahi, Vishnoi, Suyash Chandra, Guin, Angshuman, Hunter, Michael
This study integrates Transit Signal Priority (TSP) into multi-agent reinforcement learning (MARL) based traffic signal control. The first part of the study develops adaptive signal control based on MARL for a pair of coordinated intersections in a microscopic simulation environment. The two agents, one for each intersection, are centrally trained using a value decomposition network (VDN) architecture. The trained agents show slightly better performance compared to coordinated actuated signal control based on overall intersection delay at v/c of 0.95. In the second part of the study the trained signal control agents are used as background signal controllers while developing event-based TSP agents. In one variation, independent TSP agents are formulated and trained under a decentralized training and decentralized execution (DTDE) framework to implement TSP at each intersection. In the second variation, the two TSP agents are centrally trained under a centralized training and decentralized execution (CTDE) framework and VDN architecture to select and implement coordinated TSP strategies across the two intersections. In both cases the agents converge to the same bus delay value, but independent agents show high instability throughout the training process. For the test runs, the two independent agents reduce bus delay across the two intersections by 22% compared to the no TSP case while the coordinated TSP agents achieve 27% delay reduction. In both cases, there is only a slight increase in delay for a majority of the side street movements.
OASIS: Open Agent Social Interaction Simulations with One Million Agents
Yang, Ziyi, Zhang, Zaibin, Zheng, Zirui, Jiang, Yuxian, Gan, Ziyue, Wang, Zhiyu, Ling, Zijian, Chen, Jinsong, Ma, Martz, Dong, Bowen, Gupta, Prateek, Hu, Shuyue, Yin, Zhenfei, Li, Guohao, Jia, Xu, Wang, Lijun, Ghanem, Bernard, Lu, Huchuan, Lu, Chaochao, Ouyang, Wanli, Qiao, Yu, Torr, Philip, Shao, Jing
There has been a growing interest in enhancing rule-based agent-based models (ABMs) for social media platforms (i.e., X, Reddit) with more realistic large language model (LLM) agents, thereby allowing for a more nuanced study of complex systems. As a result, several LLM-based ABMs have been proposed in the past year. While they hold promise, each simulator is specifically designed to study a particular scenario, making it time-consuming and resource-intensive to explore other phenomena using the same ABM. Additionally, these models simulate only a limited number of agents, whereas real-world social media platforms involve millions of users. To this end, we propose OASIS, a generalizable and scalable social media simulator. OASIS is designed based on real-world social media platforms, incorporating dynamically updated environments (i.e., dynamic social networks and post information), diverse action spaces (i.e., following, commenting), and recommendation systems (i.e., interest-based and hot-score-based). Additionally, OASIS supports large-scale user simulations, capable of modeling up to one million users. With these features, OASIS can be easily extended to different social media platforms to study large-scale group phenomena and behaviors. We replicate various social phenomena, including information spreading, group polarization, and herd effects across X and Reddit platforms. Moreover, we provide observations of social phenomena at different agent group scales. We observe that the larger agent group scale leads to more enhanced group dynamics and more diverse and helpful agents' opinions. These findings demonstrate OASIS's potential as a powerful tool for studying complex systems in digital environments.
Naive Algorithmic Collusion: When Do Bandit Learners Cooperate and When Do They Compete?
Douglas, Connor, Provost, Foster, Sundararajan, Arun
Algorithmic agents are used in a variety of competitive decision settings, notably in making pricing decisions in contexts that range from online retail to residential home rentals. Business managers, algorithm designers, legal scholars, and regulators alike are all starting to consider the ramifications of "algorithmic collusion." We study the emergent behavior of multi-armed bandit machine learning algorithms used in situations where agents are competing, but they have no information about the strategic interaction they are engaged in. Using a general-form repeated Prisoner's Dilemma game, agents engage in online learning with no prior model of game structure and no knowledge of competitors' states or actions (e.g., no observation of competing prices). We show that these context-free bandits, with no knowledge of opponents' choices or outcomes, still will consistently learn collusive behavior - what we call "naive collusion." We primarily study this system through an analytical model and examine perturbations to the model through simulations. Our findings have several notable implications for regulators. First, calls to limit algorithms from conditioning on competitors' prices are insufficient to prevent algorithmic collusion. This is a direct result of collusion arising even in the naive setting. Second, symmetry in algorithms can increase collusion potential. This highlights a new, simple mechanism for "hub-and-spoke" algorithmic collusion. A central distributor need not imbue its algorithm with supra-competitive tendencies for apparent collusion to arise; it can simply arise by using certain (common) machine learning algorithms. Finally, we highlight that collusive outcomes depend starkly on the specific algorithm being used, and we highlight market and algorithmic conditions under which it will be unknown a priori whether collusion occurs.
MindForge: Empowering Embodied Agents with Theory of Mind for Lifelong Collaborative Learning
Lică, Mircea, Shirekar, Ojas, Colle, Baptiste, Raman, Chirag
Contemporary embodied agents, such as Voyager in Minecraft, have demonstrated promising capabilities in open-ended individual learning. However, when powered with open large language models (LLMs), these agents often struggle with rudimentary tasks, even when fine-tuned on domain-specific knowledge. These advancements enable agents to reason about their and others' mental states, empirically addressing two prevalent failure modes: false beliefs and faulty task executions. The development of generally capable agents marks a significant shift in advancing artificial intelligence, transitioning from assimilating data to generating novel knowledge through embodied interactions with open-ended environments (Kolve et al., 2017; Savva et al., 2019; Puig et al., 2018; Shridhar et al., 2020). Classical approaches leveraging reinforcement learning (Schulman et al., 2017; Hafner et al., 2023) and imitation learning (Zare et al., 2024) often struggle with generalization and exploration, as agents tend to converge on repetitive behaviors in static environments (Cobbe et al., 2019). To address these limitations, researchers have sought to emulate human-like lifelong learning capabilities, developing systems that can continuously acquire, update, and transfer knowledge over extended periods (Parisi et al., 2019; Wang et al., 2023b).The advent of large language models (LLMs) has accelerated this pursuit, enabling the development of agents such as Voyager (Wang et al., 2023a) that can apply internet-scale knowledge to continuously explore, plan, and acquire new skills in partially observable, open-ended environments such as Minecraft. Despite their promise, we argue that state-of-the-art lifelong learning agents like Voyager face a crucial limitation: they learn in isolation, neglecting a fundamental aspect of human intelligence--the social context. So central is the social context to our existence, that the Social Intelligence Hypothesis posits that our cognitive capabilities evolved primarily to navigate the complexities of social life (Humphrey, 1976; Dunbar, 1998). This isolated learning becomes particularly problematic when coupled with these agents' reliance on closed LLM) like GPT-4. Wang et al. (2023a) note that "VOYAGER requires Hey! I need help with Sure!