AITopics | Agents

Collaborating Authors

Agents

News Overviews Instructional Materials AI-Alerts Classics

Off-Policy Correction For Multi-Agent Reinforcement Learning

Zawalski, Michał, Osiński, Błażej, Michalewski, Henryk, Miłoś, Piotr

arXiv.org Artificial IntelligenceNov-22-2021

Multi-agent reinforcement learning (MARL) provides a framework for problems involving multiple interacting agents. Despite apparent similarity to the single-agent case, multi-agent problems are often harder to train and analyze theoretically. In this work, we propose MA-Trace, a new on-policy actor-critic algorithm, which extends V-Trace to the MARL setting. The key advantage of our algorithm is its high scalability in a multi-worker setting. To this end, MA-Trace utilizes importance sampling as an off-policy correction method, which allows distributing the computations with no impact on the quality of training. Furthermore, our algorithm is theoretically grounded - we prove a fixed-point theorem that guarantees convergence. We evaluate the algorithm extensively on the StarCraft Multi-Agent Challenge, a standard benchmark for multi-agent algorithms. MA-Trace achieves high performance on all its tasks and exceeds state-of-the-art results on some of them.

algorithm, learning, ma-trace, (12 more...)

arXiv.org Artificial Intelligence

2111.11229

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.14)
Europe > Poland > Masovia Province > Warsaw (0.05)
(11 more...)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Multi-lingual agents through multi-headed neural networks

Thomas, J. D., Santos-Rodríguez, R., Piechocki, R., Anca, M.

arXiv.org Artificial IntelligenceNov-22-2021

This paper considers cooperative Multi-Agent Reinforcement Learning, focusing on emergent communication in settings where multiple pairs of independent learners interact at varying frequencies. In this context, multiple distinct and incompatible languages can emerge. When an agent encounters a speaker of an alternative language, there is a requirement for a period of adaptation before they can efficiently converse. This adaptation results in the emergence of a new language and the forgetting of the previous language. In principle, this is an example of the Catastrophic Forgetting problem which can be mitigated by enabling the agents to learn and maintain multiple languages. We take inspiration from the Continual Learning literature and equip our agents with multi-headed neural networks which enable our agents to be multi-lingual. Our method is empirically validated within a referential MNIST based communication game and is shown to be able to maintain multiple languages where existing approaches cannot.

agent, communication, listener, (16 more...)

arXiv.org Artificial Intelligence

2111.11129

Country:

Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.05)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > China (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.88)

Add feedback

Episodic Multi-agent Reinforcement Learning with Curiosity-Driven Exploration

Zheng, Lulu, Chen, Jiarui, Wang, Jianhao, He, Jiamin, Hu, Yujing, Chen, Yingfeng, Fan, Changjie, Gao, Yang, Zhang, Chongjie

arXiv.org Artificial IntelligenceNov-22-2021

Efficient exploration in deep cooperative multi-agent reinforcement learning (MARL) still remains challenging in complex coordination problems. In this paper, we introduce a novel Episodic Multi-agent reinforcement learning with Curiosity-driven exploration, called EMC. We leverage an insight of popular factorized MARL algorithms that the "induced" individual Q-values, i.e., the individual utility functions used for local execution, are the embeddings of local action-observation histories, and can capture the interaction between agents due to reward backpropagation during centralized training. Therefore, we use prediction errors of individual Q-values as intrinsic rewards for coordinated exploration and utilize episodic memory to exploit explored informative experience to boost policy training. As the dynamics of an agent's individual Q-value function captures the novelty of states and the influence from other agents, our intrinsic reward can induce coordinated exploration to new or promising states. We illustrate the advantages of our method by didactic examples, and demonstrate its significant outperformance over state-of-the-art MARL baselines on challenging tasks in the StarCraft II micromanagement benchmark.

agent, exploration, intrinsic reward, (12 more...)

arXiv.org Artificial Intelligence

2111.11032

Country:

North America > Canada > Alberta (0.14)
Asia > China > Jiangsu Province > Nanjing (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.66)

Add feedback

Self Learning AI-Agents Part I: Markov Decision Processes

#artificialintelligenceNov-21-2021, 22:05:12 GMT

A Markov Decision Processes (MDP) is a discrete time stochastic control process. MDP is the best approach we have so far to model the complex environment of an AI agent. Every problem that the agent aims to solve can be considered as a sequence of states S1, S2, S3, … Sn (A state may be for example a Go/chess board configuration). The agent takes actions and moves from one state to an other. In the following you will learn the mathematics that determine which action the agent must take in any given situation.

agent, markov decision process, markov process, (6 more...)

#artificialintelligence

Genre: Instructional Material (0.58)

Industry: Leisure & Entertainment > Games > Chess (0.42)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.99)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)

Add feedback

A Software Tool for Evaluating Unmanned Autonomous Systems

Homaifar, Abdollah, Karimoddini, Ali, Heiges, Mike, Khan, Mubbashar A., Erol, Berat A., Nazmi, Shabnam

arXiv.org Artificial IntelligenceNov-21-2021

The North Carolina Agriculture and Technical State University (NC A&T) in collaboration with Georgia Tech Research Institute (GTRI) has developed methodologies for creating simulation-based technology tools that are capable of inferring the perceptions and behavioral states of autonomous systems. These methodologies have the potential to provide the Test and Evaluation (T&E) community at the Department of Defense (DoD) with a greater insight into the internal processes of these systems. The methodologies use only external observations and do not require complete knowledge of the internal processing of and/or any modifications to the system under test. This paper presents an example of one such simulation-based technology tool, named as the Data-Driven Intelligent Prediction Tool (DIPT). DIPT was developed for testing a multi-platform Unmanned Aerial Vehicle (UAV) system capable of conducting collaborative search missions. DIPT's Graphical User Interface (GUI) enables the testers to view the aircraft's current operating state, predicts its current target-detection status, and provides reasoning for exhibiting a particular behavior along with an explanation of assigning a particular task to it.

distribution statement, navair public release, tester, (13 more...)

arXiv.org Artificial Intelligence

2111.10871

Country:

North America > United States > North Carolina > Guilford County > Greensboro (0.04)
North America > United States > Texas (0.04)
North America > United States > Michigan (0.04)
(3 more...)

Genre: Research Report (0.50)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Government > Military (1.00)
Aerospace & Defense (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(3 more...)

Add feedback

Calculus of Consent via MARL: Legitimating the Collaborative Governance Supplying Public Goods

Hu, Yang, Zhu, Zhui, Song, Sirui, Liu, Xue, Yu, Yang

arXiv.org Artificial IntelligenceNov-20-2021

Public policies that supply public goods, especially those involve collaboration by limiting individual liberty, always give rise to controversies over governance legitimacy. Multi-Agent Reinforcement Learning (MARL) methods are appropriate for supporting the legitimacy of the public policies that supply public goods at the cost of individual interests. Among these policies, the inter-regional collaborative pandemic control is a prominent example, which has become much more important for an increasingly inter-connected world facing a global pandemic like COVID-19. Different patterns of collaborative strategies have been observed among different systems of regions, yet it lacks an analytical process to reason for the legitimacy of those strategies. In this paper, we use the inter-regional collaboration for pandemic control as an example to demonstrate the necessity of MARL in reasoning, and thereby legitimizing policies enforcing such inter-regional collaboration. Experimental results in an exemplary environment show that our MARL approach is able to demonstrate the effectiveness and necessity of restrictions on individual liberty for collaborative supply of public goods. Different optimal policies are learned by our MARL agents under different collaboration levels, which change in an interpretable pattern of collaboration that helps to balance the losses suffered by regions of different types, and consequently promotes the overall welfare. Meanwhile, policies learned with higher collaboration levels yield higher global rewards, which illustrates the benefit of, and thus provides a novel justification for the legitimacy of, promoting inter-regional collaboration. Therefore, our method shows the capability of MARL in computationally modeling and supporting the theory of calculus of consent, developed by Nobel Prize winner J. M. Buchanan.

agent, collaboration level, pandemic control policy, (13 more...)

arXiv.org Artificial Intelligence

2111.10627

Country:

North America > Canada > Quebec > Montreal (0.28)
Asia > China > Beijing > Beijing (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(2 more...)

Genre:

Research Report (0.64)
Personal > Honors (0.54)

Industry:

Government (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.89)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

An Activity-Based Model of Transport Demand for Greater Melbourne

Both, Alan, Singh, Dhirendra, Jafari, Afshin, Giles-Corti, Billie, Gunn, Lucy

arXiv.org Artificial IntelligenceNov-19-2021

In this paper, we present an algorithm for creating a synthetic population for the Greater Melbourne area using a combination of machine learning, probabilistic, and gravity-based approaches. We combine these techniques in a hybrid model with three primary innovations: 1. when assigning activity patterns, we generate individual activity chains for every agent, tailored to their cohort; 2. when selecting destinations, we aim to strike a balance between the distance-decay of trip lengths and the activity-based attraction of destination locations; and 3. we take into account the number of trips remaining for an agent so as to ensure they do not select a destination that would be unreasonable to return home from. Our method is completely open and replicable, requiring only publicly available data to generate a synthetic population of agents compatible with commonly used agent-based modeling software such as MATSim. The synthetic population was found to be accurate in terms of distance distribution, mode choice, and destination choice for a variety of population sizes.

activity chain, probability, time bin, (15 more...)

arXiv.org Artificial Intelligence

2111.10061

Country:

Oceania > Australia > Victoria > Melbourne (0.14)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
North America > Canada > Ontario > Toronto (0.04)
(11 more...)

Genre: Workflow (0.93)

Industry:

Transportation (1.00)
Government > Regional Government > Oceania Government > Australia Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Modeling & Simulation (0.87)

Add feedback

Finding Useful Predictions by Meta-gradient Descent to Improve Decision-making

Kearney, Alex, Koop, Anna, Günther, Johannes, Pilarski, Patrick M.

arXiv.org Artificial IntelligenceNov-18-2021

In computational reinforcement learning, a growing body of work seeks to express an agent's model of the world through predictions about future sensations. In this manuscript we focus on predictions expressed as General Value Functions: temporally extended estimates of the accumulation of a future signal. One challenge is determining from the infinitely many predictions that the agent could possibly make which might support decision-making. In this work, we contribute a meta-gradient descent method by which an agent can directly specify what predictions it learns, independent of designer instruction. To that end, we introduce a partially observable domain suited to this investigation. We then demonstrate that through interaction with the environment an agent can independently select predictions that resolve the partial-observability, resulting in performance similar to expertly chosen value functions. By learning, rather than manually specifying these predictions, we enable the agent to identify useful predictions in a self-supervised manner, taking a step towards truly autonomous systems.

agent, meta-gradient descent, prediction, (13 more...)

arXiv.org Artificial Intelligence

2111.11212

Country:

North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.15)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > Canada > Quebec > Montreal (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.64)

Add feedback

Assisted Robust Reward Design

He, Jerry Zhi-Yang, Dragan, Anca D.

arXiv.org Artificial IntelligenceNov-18-2021

Real-world robotic tasks require complex reward functions. When we define the problem the robot needs to solve, we pretend that a designer specifies this complex reward exactly, and it is set in stone from then on. In practice, however, reward design is an iterative process: the designer chooses a reward, eventually encounters an "edge-case" environment where the reward incentivizes the wrong behavior, revises the reward, and repeats. What would it mean to rethink robotics problems to formally account for this iterative nature of reward design? We propose that the robot not take the specified reward for granted, but rather have uncertainty about it, and account for the future design iterations as future evidence. We contribute an Assisted Reward Design method that speeds up the design process by anticipating and influencing this future evidence: rather than letting the designer eventually encounter failure cases and revise the reward then, the method actively exposes the designer to such environments during the development phase. We test this method in a simplified autonomous driving task and find that it more quickly improves the car's behavior in held-out environments by proposing environments that are "edge cases" for the current reward.

designer, reward design, reward function, (15 more...)

arXiv.org Artificial Intelligence

2111.09884

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Industry:

Information Technology > Robotics & Automation (0.69)
Transportation > Ground > Road (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.36)

Add feedback

Reinforcement Learning on Human Decision Models for Uniquely Collaborative AI Teammates

Kantack, Nicholas

arXiv.org Artificial IntelligenceNov-18-2021

In 2021 the Johns Hopkins University Applied Physics Laboratory held an internal challenge to develop artificially intelligent (AI) agents that could excel at the collaborative card game Hanabi. Agents were evaluated on their ability to play with human players whom the agents had never previously encountered. This study details the development of the agent that won the challenge by achieving a human-play average score of 16.5, outperforming the current state-of-the-art for human-bot Hanabi scores. The winning agent's development consisted of observing and accurately modeling the author's decision making in Hanabi, then training with a behavioral clone of the author. Notably, the agent discovered a human-complementary play style by first mimicking human decision making, then exploring variations to the human-like strategy that led to higher simulated human-bot scores. This work examines in detail the design and implementation of this human compatible Hanabi teammate, as well as the existence and implications of human-complementary strategies and how they may be explored for more successful applications of AI in human machine teams.

agent, cyclone agent, teammate, (17 more...)

arXiv.org Artificial Intelligence

2111.098

Country: North America > United States > Maryland > Prince George's County > Laurel (0.04)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.83)

Add feedback