AITopics | Agents

Collaborating Authors

Agents

News Overviews Instructional Materials AI-Alerts Classics

A Joint Learning and Communication Framework for Multi-Agent Reinforcement Learning over Noisy Channels

Tung, Tze-Yang, Pujol, Joan Roig, Kobus, Szymon, Gunduz, Deniz

arXiv.org Artificial IntelligenceJan-2-2021

We propose a novel formulation of the "effectiveness problem" in communications, put forth by Shannon and Weaver in their seminal work [2], by considering multiple agents communicating over a noisy channel in order to achieve better coordination and cooperation in a multi-agent reinforcement learning (MARL) framework. Specifically, we consider a multi-agent partially observable Markov decision process (MA-POMDP), in which the agents, in addition to interacting with the environment can also communicate with each other over a noisy communication channel. The noisy communication channel is considered explicitly as part of the dynamics of the environment and the message each agent sends is part of the action that the agent can take. As a result, the agents learn not only to collaborate with each other but also to communicate "effectively" over a noisy channel. This framework generalizes both the traditional communication problem, where the main goal is to convey a message reliably over a noisy channel, and the "learning to communicate" framework that has received recent attention in the MARL literature, where the underlying communication channels are assumed to be error-free. We show via examples that the joint policy learned using the proposed framework is superior to that where the communication is considered separately from the underlying MA-POMDP. This is a very powerful framework, which has many real world applications, from autonomous vehicle planning to drone swarm control, and opens up the rich toolbox of deep reinforcement learning for the design of multi-user communication systems. This work was supported in part by the European Research Council (ERC) Starting Grant BEACON (grant agreement no. An earlier version of this work was presented at the IEEE Global Communications Conference (GLOBECOM) in December 2020 [1]. Communication is essential for our society. Humans use language to communicate ideas, which has given rise to complex social structures, and scientists have observed either gestural or vocal communication in other animal groups, complexity of which increases with the complexity of the social structure of the group [3].

agent, communication, communication channel, (15 more...)

arXiv.org Artificial Intelligence

2101.10369

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

If You're Happy, Then You Know It: The Logic of Happiness... and Sadness

Azimipour, Sanaz, Naumov, Pavel

arXiv.org Artificial IntelligenceJan-2-2021

To be able to understand and predict human actions, artificial agents must be able to identify, comprehend, and reason about human emotions. Different formal models of human emotions have been studied in AI literature. Doyle, Shoham, and Wellman propose a logic of relative desire [1]. Lang, Van Der Torre, and Weydert introduce utilitarian desires [2]. Meyer states logical principles aiming at capturing anger and fear [3]. Steunebrink, Dastani, and Meyer expand this work to hope [4]. Adam, Herzig, and Longin propose formal definitions of hope, fear, relief, disappointment, resentment, gloating, pride, shame, admiration, reproach, gratification, remorse, gratitude, and anger [5].

definition 2, pavel, restaurant, (14 more...)

arXiv.org Artificial Intelligence

2101.00485

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)
Europe > Germany > Berlin (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (0.68)

Add feedback

Multi-Agent Reinforcement Learning for Unmanned Aerial Vehicle Coordination by Multi-Critic Policy Gradient Optimization

Alon, Yoav, Zhou, Huiyu

arXiv.org Artificial IntelligenceDec-31-2020

Recent technological progress in the development of Unmanned Aerial Vehicles (UAVs) together with decreasing acquisition costs make the application of drone fleets attractive for a wide variety of tasks. In agriculture, disaster management, search and rescue operations, commercial and military applications, the advantage of applying a fleet of drones originates from their ability to cooperate autonomously. Multi-Agent Reinforcement Learning approaches that aim to optimize a neural network based control policy, such as the best performing actor-critic policy gradient algorithms, struggle to effectively back-propagate errors of distinct rewards signal sources and tend to favor lucrative signals while neglecting coordination and exploitation of previously learned similarities. We propose a Multi-Critic Policy Optimization architecture with multiple value estimating networks and a novel advantage function that optimizes a stochastic actor policy network to achieve optimal coordination of agents. Consequently, we apply the algorithm to several tasks that require the collaboration of multiple drones in a physics-based reinforcement learning environment. Our approach achieves a stable policy network update and similarity in reward signal development for an increasing number of agents. The resulting policy achieves optimal coordination and compliance with constraints such as collision avoidance.

agent, architecture, reinforcement, (13 more...)

arXiv.org Artificial Intelligence

2012.15472

Country:

Europe > United Kingdom > England > Leicestershire > Leicester (0.05)
North America > United States > Texas (0.04)
Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
(2 more...)

Genre: Research Report (0.65)

Industry:

Education (0.66)
Transportation (0.66)
Information Technology > Robotics & Automation (0.61)
Aerospace & Defense > Aircraft (0.61)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Model Free Reinforcement Learning Algorithm for Stationary Mean field Equilibrium for Multiple Types of Agents

Ghosh, Arnob, Aggarwal, Vaneet

arXiv.org Artificial IntelligenceDec-30-2020

We consider a multi-agent Markov strategic interaction over an infinite horizon where agents can be of multiple types. We model the strategic interaction as a mean-field game in the asymptotic limit when the number of agents of each type becomes infinite. Each agent has a private state; the state evolves depending on the distribution of the state of the agents of different types and the action of the agent. Each agent wants to maximize the discounted sum of rewards over the infinite horizon which depends on the state of the agent and the distribution of the state of the leaders and followers. We seek to characterize and compute a stationary multi-type Mean field equilibrium (MMFE) in the above game. We characterize the conditions under which a stationary MMFE exists. Finally, we propose Reinforcement learning (RL) based algorithm using policy gradient approach to find the stationary MMFE when the agents are unaware of the dynamics. We, numerically, evaluate how such kind of interaction can model the cyber attacks among defenders and adversaries, and show how RL based algorithm can converge to an equilibrium.

agent, mmfe, population distribution, (12 more...)

arXiv.org Artificial Intelligence

2012.15377

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
South America > Uruguay > Maldonado > Maldonado (0.04)
North America > United States > Washington > King County > Bellevue (0.04)
(4 more...)

Genre: Research Report (0.40)

Industry: Information Technology > Security & Privacy (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.35)

Add feedback

Present-Biased Optimization

Fomin, Fedor V., Fraigniaud, Pierre, Golovach, Petr A.

arXiv.org Artificial IntelligenceDec-29-2020

This paper explores the behavior of present-biased agents, that is, agents who erroneously anticipate the costs of future actions compared to their real costs. Specifically, the paper extends the original framework proposed by Akerlof (1991) for studying various aspects of human behavior related to time-inconsistent planning, including procrastination, and abandonment, as well as the elegant graph-theoretic model encapsulating this framework recently proposed by Kleinberg and Oren (2014). The benefit of this extension is twofold. First, it enables to perform fine grained analysis of the behavior of present-biased agents depending on the optimisation task they have to perform. In particular, we study covering tasks vs. hitting tasks, and show that the ratio between the cost of the solutions computed by present-biased agents and the cost of the optimal solutions may differ significantly depending on the problem constraints. Second, our extension enables to study not only underestimation of future costs, coupled with minimization problems, but also all combinations of minimization/maximization, and underestimation/overestimation. We study the four scenarios, and we establish upper bounds on the cost ratio for three of them (the cost ratio for the original scenario was known to be unbounded), providing a complete global picture of the behavior of present-biased agents, as far as optimisation tasks are concerned.

agent, artificial intelligence, optimization problem, (19 more...)

arXiv.org Artificial Intelligence

2012.14736

Country:

Europe > Norway > Western Norway > Vestland > Bergen (0.04)
Europe > France (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Add feedback

PMGT-VR: A decentralized proximal-gradient algorithmic framework with variance reduction

Ye, Haishan, Xiong, Wei, Zhang, Tong

arXiv.org Artificial IntelligenceDec-29-2020

This paper considers the decentralized composite optimization problem. We propose a novel decentralized variance-reduced proximal-gradient algorithmic framework, called PMGT-VR, which is based on a combination of several techniques including multi-consensus, gradient tracking, and variance reduction. The proposed framework relies on an imitation of centralized algorithms and we demonstrate that algorithms under this framework achieve convergence rates similar to that of their centralized counterparts. We also describe and analyze two representative algorithms, PMGT-SAGA and PMGT-LSVRG, and compare them to existing state-of-the-art proximal algorithms. To the best of our knowledge, PMGT-VR is the first variance-reduction method that can solve decentralized composite optimization problems. Numerical experiments are provided to demonstrate the effectiveness of the proposed algorithms.

algorithm, optimization, pmgt-lsvrg, (14 more...)

arXiv.org Artificial Intelligence

2012.1501

Country:

Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > Italy > Sicily > Palermo (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Modeling Social Interaction for Baby in Simulated Environment for Developmental Robotics

Mondol, Md Ashaduzzaman Rubel, Pothula, Aishwarya, Park, Deokgun

arXiv.org Artificial IntelligenceDec-29-2020

Task-specific AI agents are showing remarkable performance across different domains. But modeling generalized AI agents like human intelligence will require more than current datasets or only reward-based environments that don't include experiences that an infant gathers throughout its initial stages. In this paper, we present Simulated Environment for Developmental Robotics (SEDRo). It simulates the environments for a baby agent that a human baby experiences throughout the pre-born fetus stage to post-birth 12 months. SEDRo also includes a mother character to provide social interaction with the agent. To evaluate different developmental milestones of the agent, SEDRo incorporates some experiments from developmental psychology.

agent, interaction, simulated environment, (13 more...)

arXiv.org Artificial Intelligence

2012.14842

Country:

North America > United States > Texas > Tarrant County > Arlington (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.55)

Add feedback

Prosocial Norm Emergence in Multiagent Systems

Mashayekhi, Mehdi, Ajmeri, Nirav, List, George F., Singh, Munindar P.

arXiv.org Artificial IntelligenceDec-28-2020

Multiagent systems provide a basis of developing systems of autonomous entities and thus find application in a variety of domains. We consider a setting where not only the member agents are adaptive but also the multiagent system itself is adaptive. Specifically, the social structure of a multiagent system can be reflected in the social norms among its members. It is well recognized that the norms that arise in society are not always beneficial to its members. We focus on prosocial norms, which help achieve positive outcomes for society and often provide guidance to agents to act in a manner that takes into account the welfare of others. Specifically, we propose Cha, a framework for the emergence of prosocial norms. Unlike previous norm emergence approaches, Cha supports continual change to a system (agents may enter and leave), and dynamism (norms may change when the environment changes). Importantly, Cha agents incorporate prosocial decision making based on inequity aversion theory, reflecting an intuition of guilt from being antisocial. In this manner, Cha brings together two important themes in prosociality: decision making by individuals and fairness of system-level outcomes. We demonstrate via simulation that Cha can improve aggregate societal gains and fairness of outcomes.

agent, multiagent system, vehicle, (11 more...)

arXiv.org Artificial Intelligence

2012.14581

Country:

North America > United States > New York (0.04)
North America > United States > North Carolina > Wake County > Raleigh (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
(17 more...)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.68)

Industry: Transportation (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Causal World Models by Unsupervised Deconfounding of Physical Dynamics

Li, Minne, Yang, Mengyue, Liu, Furui, Chen, Xu, Chen, Zhitang, Wang, Jun

arXiv.org Artificial IntelligenceDec-28-2020

The capability of imagining internally with a mental model of the world is vitally important for human cognition. If a machine intelligent agent can learn a world model to create a "dream" environment, it can then internally ask what-if questions -- simulate the alternative futures that haven't been experienced in the past yet -- and make optimal decisions accordingly. Existing world models are established typically by learning spatio-temporal regularities embedded from the past sensory signal without taking into account confounding factors that influence state transition dynamics. As such, they fail to answer the critical counterfactual questions about "what would have happened" if a certain action policy was taken. In this paper, we propose Causal World Models (CWMs) that allow unsupervised modeling of relationships between the intervened observations and the alternative futures by learning an estimator of the latent confounding factors. We empirically evaluate our method and demonstrate its effectiveness in a variety of physical reasoning environments. Specifically, we show reductions in sample complexity for reinforcement learning tasks and improvements in counterfactual physical reasoning.

causal world model, international conference, unsupervised deconfounding, (10 more...)

arXiv.org Artificial Intelligence

2012.14228

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
(12 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

Automatic Curriculum Learning With Over-repetition Penalty for Dialogue Policy Learning

Zhao, Yangyang, Wang, Zhenyu, Huang, Zhenhua

arXiv.org Artificial IntelligenceDec-27-2020

Dialogue policy learning based on reinforcement learning is difficult to be applied to real users to train dialogue agents from scratch because of the high cost. User simulators, which choose random user goals for the dialogue agent to train on, have been considered as an affordable substitute for real users. However, this random sampling method ignores the law of human learning, making the learned dialogue policy inefficient and unstable. We propose a novel framework, Automatic Curriculum Learning-based Deep Q-Network (ACL-DQN), which replaces the traditional random sampling method with a teacher policy model to realize the dialogue policy for automatic curriculum learning. The teacher model arranges a meaningful ordered curriculum and automatically adjusts it by monitoring the learning progress of the dialogue agent and the over-repetition penalty without any requirement of prior knowledge. The learning progress of the dialogue agent reflects the relationship between the dialogue agent's ability and the sampled goals' difficulty for sample efficiency. The over-repetition penalty guarantees the sampled diversity. Experiments show that the ACL-DQN significantly improves the effectiveness and stability of dialogue tasks with a statistically significant margin. Furthermore, the framework can be further improved by equipping with different curriculum schedules, which demonstrates that the framework has strong generalizability.

acl-dqn, agent, user goal, (10 more...)

arXiv.org Artificial Intelligence

2012.14072

Country:

North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Europe > Germany > Berlin (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)
(13 more...)

Genre: Research Report (0.82)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.74)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.68)

Add feedback