AITopics | Agent Societies

Collaborating Authors

Agent Societies

News Overviews Instructional Materials AI-Alerts Classics

From Explicit Communication to Tacit Cooperation:A Novel Paradigm for Cooperative MARL

Li, Dapeng, Xu, Zhiwei, Zhang, Bin, Fan, Guoliang

arXiv.org Artificial IntelligenceApr-28-2023

Centralized training with decentralized execution (CTDE) is a widely-used learning paradigm that has achieved significant success in complex tasks. However, partial observability issues and the absence of effectively shared signals between agents often limit its effectiveness in fostering cooperation. While communication can address this challenge, it simultaneously reduces the algorithm's practicality. Drawing inspiration from human team cooperative learning, we propose a novel paradigm that facilitates a gradual shift from explicit communication to tacit cooperation. In the initial training stage, we promote cooperation by sharing relevant information among agents and concurrently reconstructing this information using each agent's local trajectory. We then combine the explicitly communicated information with the reconstructed information to obtain mixed information. Throughout the training process, we progressively reduce the proportion of explicitly communicated information, facilitating a seamless transition to fully decentralized execution without communication. Experimental results in various scenarios demonstrate that the performance of our method without communication can approaches or even surpasses that of QMIX and communication-based methods.

information, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

2304.14656

Country:

North America > United States > California > Los Angeles County > Long Beach (0.14)
Oceania > Australia > New South Wales > Sydney (0.14)
North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
(12 more...)

Genre:

Research Report (0.64)
Overview (0.46)

Industry: Leisure & Entertainment (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Decentralized Inference via Capability Type Structures in Cooperative Multi-Agent Systems

Jin, Charles, Hong, Zhang-Wei, Arthaud, Farid, Orzech, Idan, Rinard, Martin

arXiv.org Artificial IntelligenceApr-27-2023

This work studies the problem of ad hoc teamwork in teams composed of agents with differing computational capabilities. We consider cooperative multi-player games in which each agent's policy is constrained by a private capability parameter, and agents with higher capabilities are able to simulate the behavior of agents with lower capabilities (but not vice-versa). To address this challenge, we propose an algorithm that maintains a belief over the other agents' capabilities and incorporates this belief into the planning process. Our primary innovation is a novel framework based on capability type structures, which ensures that the belief updates remain consistent and informative without constructing the infinite hierarchy of beliefs. We also extend our techniques to settings where the agents' observations are subject to noise. We provide examples of games in which deviations in capability between oblivious agents can lead to arbitrarily poor outcomes, and experimentally validate that our capability-aware algorithm avoids the anti-cooperative behavior of the naive approach in these toy settings as well as a more complex cooperative checkers environment.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2304.13957

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
Europe > Slovenia > Central Slovenia > Municipality of Komenda > Komenda (0.04)
Europe > Germany > North Rhine-Westphalia > Düsseldorf Region > Düsseldorf (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Preference Inference from Demonstration in Multi-objective Multi-agent Decision Making

Lu, Junlin

arXiv.org Artificial IntelligenceApr-27-2023

It is challenging to quantify numerical preferences for different objectives in a multi-objective decision-making problem. However, the demonstrations of a user are often accessible. We propose an algorithm to infer linear preference weights from either optimal or near-optimal demonstrations. The algorithm is evaluated in three environments with two baseline methods. Empirical results demonstrate significant improvements compared to the baseline algorithms, in terms of both time requirements and accuracy of the inferred preferences. In future work, we plan to evaluate the algorithm's effectiveness in a multi-agent system, where one of the agents is enabled to infer the preferences of an opponent using our preference inference algorithm.

agent, algorithm, artificial intelligence, (12 more...)

arXiv.org Artificial Intelligence

2304.14126

Country:

Europe > Ireland (0.15)
North America > United States > Illinois > Cook County > Chicago (0.05)
North America > Canada > Quebec > Montreal (0.05)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.52)

Add feedback

Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning

Haarnoja, Tuomas, Moran, Ben, Lever, Guy, Huang, Sandy H., Tirumala, Dhruva, Wulfmeier, Markus, Humplik, Jan, Tunyasuvunakool, Saran, Siegel, Noah Y., Hafner, Roland, Bloesch, Michael, Hartikainen, Kristian, Byravan, Arunkumar, Hasenclever, Leonard, Tassa, Yuval, Sadeghi, Fereshteh, Batchelor, Nathan, Casarini, Federico, Saliceti, Stefano, Game, Charles, Sreendra, Neil, Patel, Kushal, Gwira, Marlon, Huber, Andrea, Hurley, Nicole, Nori, Francesco, Hadsell, Raia, Heess, Nicolas

arXiv.org Artificial IntelligenceApr-26-2023

We investigate whether Deep Reinforcement Learning (Deep RL) is able to synthesize sophisticated and safe movement skills for a low-cost, miniature humanoid robot that can be composed into complex behavioral strategies in dynamic environments. We used Deep RL to train a humanoid robot with 20 actuated joints to play a simplified one-versus-one (1v1) soccer game. We first trained individual skills in isolation and then composed those skills end-to-end in a self-play setting. The resulting policy exhibits robust and dynamic movement skills such as rapid fall recovery, walking, turning, kicking and more; and transitions between them in a smooth, stable, and efficient manner - well beyond what is intuitively expected from the robot. The agents also developed a basic strategic understanding of the game, and learned, for instance, to anticipate ball movements and to block opponent shots. The full range of behaviors emerged from a small set of simple rewards. Our agents were trained in simulation and transferred to real robots zero-shot. We found that a combination of sufficiently high-frequency control, targeted dynamics randomization, and perturbations during training in simulation enabled good-quality transfer, despite significant unmodeled effects and variations across robot instances. Although the robots are inherently fragile, minor hardware modifications together with basic regularization of the behavior during training led the robots to learn safe and effective movements while still performing in a dynamic and agile way. Indeed, even though the agents were optimized for scoring, in experiments they walked 156% faster, took 63% less time to get up, and kicked 24% faster than a scripted baseline, while efficiently combining the skills to achieve the longer term objectives. Examples of the emergent behaviors and full 1v1 matches are available on the supplementary website.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

2304.13653

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(3 more...)

Genre:

Research Report (0.64)
Instructional Material (0.46)

Industry: Leisure & Entertainment > Sports > Soccer (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.34)

Add feedback

Zero-shot Transfer Learning of Driving Policy via Socially Adversarial Traffic Flow

Zhang, Dongkun, Xue, Jintao, Cui, Yuxiang, Wang, Yunkai, Liu, Eryun, Jing, Wei, Chen, Junbo, Xiong, Rong, Wang, Yue

arXiv.org Artificial IntelligenceApr-25-2023

Acquiring driving policies that can transfer to unseen environments is challenging when driving in dense traffic flows. The design of traffic flow is essential and previous studies are unable to balance interaction and safety-criticism. To tackle this problem, we propose a socially adversarial traffic flow. We propose a Contextual Partially-Observable Stochastic Game to model traffic flow and assign Social Value Orientation (SVO) as context. We then adopt a two-stage framework. In Stage 1, each agent in our socially-aware traffic flow is driven by a hierarchical policy where upper-level policy communicates genuine SVOs of all agents, which the lower-level policy takes as input. In Stage 2, each agent in the socially adversarial traffic flow is driven by the hierarchical policy where upper-level communicates mistaken SVOs, taken by the lower-level policy trained in Stage 1. Driving policy is adversarially trained through a zero-sum game formulation with upper-level policies, resulting in a policy with enhanced zero-shot transfer capability to unseen traffic flows. Comprehensive experiments on cross-validation verify the superior zero-shot transfer performance of our method.

artificial intelligence, machine learning, traffic flow, (17 more...)

arXiv.org Artificial Intelligence

2304.12821

Country:

Asia > China > Zhejiang Province > Hangzhou (0.04)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (0.69)
Research Report > New Finding (0.46)

Industry: Consumer Products & Services > Travel (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.93)

Add feedback

A Spatial Calibration Method for Robust Cooperative Perception

Song, Zhiying, Xie, Tenghui, Zhang, Hailiang, Wen, Fuxi, Li, Jun

arXiv.org Artificial IntelligenceApr-25-2023

Cooperative perception is a promising technique for enhancing the perception capabilities of automated vehicles through vehicle-to-everything (V2X) cooperation, provided that accurate relative pose transforms are available. Nevertheless, obtaining precise positioning information often entails high costs associated with navigation systems. Moreover, signal drift resulting from factors such as occlusion and multipath effects can compromise the stability of the positioning information. Hence, a low-cost and robust method is required to calibrate relative pose information for multi-agent cooperative perception. In this paper, we propose a simple but effective inter-agent object association approach (CBM), which constructs contexts using the detected bounding boxes, followed by local context matching and global consensus maximization. Based on the matched correspondences, optimal relative pose transform is estimated, followed by cooperative perception fusion. Extensive experimental studies are conducted on both the simulated and real-world datasets, high object association precision and decimeter level relative pose calibration accuracy is achieved among the cooperating agents even with larger inter-agent localization errors. Furthermore, the proposed approach outperforms the state-of-the-art methods in terms of object association and relative pose estimation accuracy, as well as the robustness of cooperative perception against the pose errors of the connected agents. The code will be available at https://github.com/zhyingS/CBM.

artificial intelligence, machine learning, perception, (19 more...)

arXiv.org Artificial Intelligence

2304.12033

Country:

North America > United States > California > Los Angeles County > Culver City (0.05)
North America > United States > Pennsylvania (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre:

Research Report > Promising Solution (0.68)
Research Report > New Finding (0.48)

Industry:

Transportation > Ground > Road (0.93)
Information Technology (0.68)
Automobiles & Trucks (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.49)

Add feedback

Model-Free Learning and Optimal Policy Design in Multi-Agent MDPs Under Probabilistic Agent Dropout

Fiscko, Carmel, Kar, Soummya, Sinopoli, Bruno

arXiv.org Artificial IntelligenceApr-24-2023

This work studies a multi-agent Markov decision process (MDP) that can undergo agent dropout and the computation of policies for the post-dropout system based on control and sampling of the pre-dropout system. The controller's objective is to find an optimal policy that maximizes the value of the expected system given a priori knowledge of the agents' dropout probabilities. Finding an optimal policy for any specific dropout realization is a special case of this problem. For MDPs with a certain transition independence and reward separability structure, we assume that removing agents from the system forms a new MDP comprised of the remaining agents with new state and action spaces, transition dynamics that marginalize the removed agents, and rewards that are independent of the removed agents. We first show that under these assumptions, the value of the expected post-dropout system can be represented by a single MDP; this "robust MDP" eliminates the need to evaluate all $2^N$ realizations of the system, where $N$ denotes the number of agents. More significantly, in a model-free context, it is shown that the robust MDP value can be estimated with samples generated by the pre-dropout system, meaning that robust policies can be found before dropout occurs. This fact is used to propose a policy importance sampling (IS) routine that performs policy evaluation for dropout scenarios while controlling the existing system with good pre-dropout policies. The policy IS routine produces value estimates for both the robust MDP and specific post-dropout system realizations and is justified with exponential confidence bounds. Finally, the utility of this approach is verified in simulation, showing how structural properties of agent dropout can help a controller find good post-dropout policies before dropout occurs.

agent, artificial intelligence, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2304.12458

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > Missouri > St. Louis County > St. Louis (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Add feedback

SEA: A Spatially Explicit Architecture for Multi-Agent Reinforcement Learning

Li, Dapeng, Xu, Zhiwei, Zhang, Bin, Fan, Guoliang

arXiv.org Artificial IntelligenceApr-24-2023

Spatial information is essential in various fields. How to explicitly model according to the spatial location of agents is also very important for the multi-agent problem, especially when the number of agents is changing and the scale is enormous. Inspired by the point cloud task in computer vision, we propose a spatial information extraction structure for multi-agent reinforcement learning in this paper. Agents can effectively share the neighborhood and global information through a spatially encoder-decoder structure. Our method follows the centralized training with decentralized execution (CTDE) paradigm. In addition, our structure can be applied to various existing mainstream reinforcement learning algorithms with minor modifications and can deal with the problem with a variable number of agents. The experiments in several multi-agent scenarios show that the existing methods can get convincing results by adding our spatially explicit architecture.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

2304.12532

Country:

North America > United States > California > Los Angeles County > Long Beach (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
(5 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.70)

Add feedback

Adaptive patch foraging in deep reinforcement learning agents

Wispinski, Nathan J., Butcher, Andrew, Mathewson, Kory W., Chapman, Craig S., Botvinick, Matthew M., Pilarski, Patrick M.

arXiv.org Artificial IntelligenceApr-21-2023

Patch foraging is one of the most heavily studied behavioral optimization challenges in biology. However, despite its importance to biological intelligence, this behavioral optimization problem is understudied in artificial intelligence research. Patch foraging is especially amenable to study given that it has a known optimal solution, which may be difficult to discover given current techniques in deep reinforcement learning. Here, we investigate deep reinforcement learning agents in an ecological patch foraging task. For the first time, we show that machine learning agents can learn to patch forage adaptively in patterns similar to biological foragers, and approach optimal patch foraging behavior when accounting for temporal discounting. Finally, we show emergent internal dynamics in these agents that resemble single-cell recordings from foraging non-human primates, which complements experimental and theoretical work on the neural mechanisms of biological foraging. This work suggests that agents interacting in complex environments with ecologically valid pressures arrive at common solutions, suggesting the emergence of foundational computations behind adaptive, intelligent behavior in both biological and artificial agents.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2210.08085

Country:

North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.05)
North America > United States > New York (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report > New Finding (0.94)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.48)

Add feedback

Reducing Opinion Echo-Chambers by Intelligent Placement of Moderate-Minded Agents

Jana, Prithwish, Choudhury, Romit Roy, Ganguly, Niloy

arXiv.org Artificial IntelligenceApr-21-2023

In the era of social media, people frequently share their own opinions online on various issues and also in the way, get exposed to others' opinions. Be it for selective exposure of news feed recommendation algorithms or our own inclination to listen to opinions that support ours, the result is that we get more and more exposed to opinions closer to ours. Further, any population is inherently heterogeneous i.e. people will hold a varied range of opinions regarding a topic and showcase a varied range of openness to get influenced by others. In this paper, we demonstrate the different behavior put forward by open- and close-minded agents towards an issue, when allowed to freely intermix and communicate. We have shown that the intermixing among people leads to formation of opinion echo chambers i.e. a small closed network of people who hold similar opinions and are not affected by opinions of people outside the network. Echo chambers are evidently harmful for a society because it inhibits free healthy communication among all and thus, prevents exchange of opinions, spreads misinformation and increases extremist beliefs. This calls for reduction in echo chambers, because a total consensus of opinion is neither possible nor is welcome. We show that the number of echo chambers depends on the number of close-minded agents and cannot be lessened by increasing the number of open-minded agents. We identify certain 'moderate'-minded agents, who possess the capability of manipulating and reducing the number of echo chambers. The paper proposes an algorithm for intelligent placement of moderate-minded agents in the opinion-time spectrum by which the opinion echo chambers can be maximally reduced. With various experimental setups, we demonstrate that the proposed algorithm fares well when compared to placement of other agents (open- or close-minded) and random placement of 'moderate'-minded agents.

agent, artificial intelligence, social media, (18 more...)

arXiv.org Artificial Intelligence

2304.10745

Country:

Asia > India > West Bengal > Kharagpur (0.04)
North America > United States > Illinois (0.04)
North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)

Genre:

Personal (0.46)
Research Report (0.40)

Industry: Media > News (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)

Add feedback