AITopics

1910.0224

Country:

North America > United States > Montana (0.04)
Asia > China > Liaoning Province > Shenyang (0.04)
Asia > China > Hunan Province (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.35)

Korbak, Tomasz, Zubek, Julian, Kuciński, Łukasz, Miłoś, Piotr, Rączaszek-Leonardi, Joanna

Developmentally motivated emergence of compositional communication via template transfer

arXiv.org Artificial IntelligenceOct-4-2019

This paper explores a novel approach to achieving emergent compositional communication in multi-agent systems. We propose a training regime implementing template transfer, the idea of carrying over learned biases across contexts. In our method, a sender-receiver pair is first trained with disentangled loss functions and then the receiver is transferred to train a new sender with a standard loss. Unlike other methods (e.g. the obverter algorithm), our approach does not require imposing inductive biases on the architecture of the agents. We experimentally show the emergence of compositional communication using topographical similarity, zero-shot generalization and context independence as evaluation metrics. The presented approach is connected to an important line of work in semiotics and developmental psycholinguistics: it supports a conjecture that compositional communication is scaffolded on simpler communication protocols.

arxiv, communication protocol, template transfer, (13 more...)

1910.06079

Country:

Europe > Poland > Masovia Province > Warsaw (0.05)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Bristol (0.04)
Europe > Italy > Lazio > Rome (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Ackermann, Johannes, Gabler, Volker, Osa, Takayuki, Sugiyama, Masashi

Reducing Overestimation Bias in Multi-Agent Domains Using Double Centralized Critics

arXiv.org Artificial IntelligenceOct-3-2019

Many real world tasks require multiple agents to work together. Multi-agent reinforcement learning (RL) methods have been proposed in recent years to solve these tasks, but current methods often fail to efficiently learn policies. We thus investigate the presence of a common weakness in single-agent RL, namely value function overestimation bias, in the multi-agent setting. Based on our findings, we propose an approach that reduces this bias by using double centralized critics. We evaluate it on six mixed cooperative-competitive tasks, showing a significant advantage over current methods. Finally, we investigate the application of multi-agent methods to high-dimensional robotic tasks and show that our approach can be used to learn decentralized policies in this domain.

agent, communication, maddpg, (13 more...)

1910.01465

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Europe > Switzerland (0.04)
Asia > Japan > Kyūshū & Okinawa > Kyūshū (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceOct-3-2019

GRAVITAS: A Model Checking Based Planning and Goal Reasoning Framework for Autonomous Systems

Bride, Hadrien, Dong, Jin Song, Green, Ryan, Hou, Zhe, Mahony, Brendan, Oxenham, Martin

While AI techniques have found many successful applications in autonomous systems, many of them permit behaviours that are difficult to interpret and may lead to uncertain results. We follow the "verification as planning" paradigm and propose to use model checking techniques to solve planning and goal reasoning problems for autonomous systems. We give a new formulation of Goal Task Network (GTN) that is tailored for our model checking based framework. We then provide a systematic method that models GTNs in the model checker Process Analysis Toolkit (PAT). We present our planning and goal reasoning system as a framework called Goal Reasoning And Verification for Independent Trusted Autonomous Systems (GRAVITAS) and discuss how it helps provide trustworthy plans in an uncertain environment. Finally, we demonstrate the proposed ideas in an experiment that simulates a survey mission performed by the REMUS-100 autonomous underwater vehicle.

auv, goal reasoning, model checking, (14 more...)

1910.0138

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
Oceania > Australia > South Australia (0.04)
(8 more...)

Genre: Research Report (0.40)

Industry:

Information Technology (1.00)
Transportation > Air (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

arXiv.org Machine LearningOct-2-2019

Formal Language Constraints for Markov Decision Processes

Quint, Eleanor, Xu, Dong, Dogan, Haluk, Hakguder, Zeynep, Scott, Stephen, Dwyer, Matthew

In order to satisfy safety conditions, a reinforcement learned (RL) agent maybe constrained from acting freely, e.g., to prevent trajectories that might cause unwanted behavior or physical damage in a robot. We propose a general framework for augmenting a Markov decision process (MDP) with constraints that are described in formal languages over sequences of MDP states and agent actions. Constraint enforcement is implemented by filtering the allowed action set or by applying potential-based reward shaping to implement hard and soft constraint enforcement, respectively. We instantiate this framework using deterministic finite automata to encode constraints and propose methods of augmenting MDP observations with the state of the constraint automaton for learning. We empirically evaluate these methods with a variety of constraints by training Deep Q-Networks in Atari games as well as Proximal Policy Optimization in MuJoCo environments. We experimentally find that our approaches are effective in significantly reducing or eliminating constraint violations with either minimal negative or, depending on the constraint, a clear positive impact on final performance.

constraint, constraint violation, violation, (15 more...)

arXiv.org Machine Learning

1910.01074

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
(12 more...)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

#artificialintelligenceOct-1-2019, 13:26:14 GMT

Using Game-Theory and Decentralization to Scale Multi-Agent Reinforcement Learning Models

When we think about training or learning processes in deep learning solution we typically visualize centralized models. In those architectures a series of central nodes collect and curate datasets which are used to train the models that are deployed across different nodes in a network. Even in distributed scenarios such as multi-agent reinforcement learning(MARL) that can include tens of thousands of nodes running a model the learning models rely on a handful of centralized nodes. Centralized learning is conceptually simple to implement but incredibly hard to scale. Imagine an internet of things(IOT) scenario with hundreds of thousands of devices collecting data and executing a reinforcement learning model.

agent, reinforcement, scenario, (9 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Games (0.41)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)

Wu, Cathy, Kreidieh, Aboudy, Parvate, Kanaad, Vinitsky, Eugene, Bayen, Alexandre M

Flow: A Modular Learning Framework for Autonomy in Traffic

The rapid development of autonomous vehicles (AVs) holds vast potential for transportation systems through improved safety, efficiency, and access to mobility. However, due to numerous technical, political, and human factors challenges, new methodologies are needed to design vehicles and transportation systems for these positive outcomes. This article tackles important technical challenges arising from the partial adoption of autonomy (hence termed mixed autonomy, to involve both AVs and human-driven vehicles): partial control, partial observation, complex multi-vehicle interactions, and the sheer variety of traffic settings represented by real-world networks. To enable the study of the full diversity of traffic settings, we first propose to decompose traffic control tasks into modules, which may be configured and composed to create new control tasks of interest. These modules include salient aspects of traffic control tasks: networks, actors, control laws, metrics, initialization, and additional dynamics. Second, we study the potential of model-free deep Reinforcement Learning (RL) methods to address the complexity of traffic dynamics. The resulting modular learning framework is called Flow. Using Flow, we create and study a variety of mixed-autonomy settings, including single-lane, multi-lane, and intersection traffic. In all cases, the learned control law exceeds human driving performance (measured by system-level velocity) by at least 40% with only 5-10% adoption of AVs. In the case of partially-observed single-lane traffic, we show that a low-parameter neural network control law can eliminate commonly observed stop-and-go traffic. In particular, the control laws surpass all known model-based controllers, achieving near-optimal performance across a wide spectrum of vehicle densities (even with a memoryless control law) and generalizing to out-of-distribution vehicle densities.

control law, deep learning, neural network, (21 more...)

1710.05465

Country:

Asia > Middle East (0.14)
North America > United States > California (0.14)
Europe > France (0.14)

Genre: Research Report > New Finding (0.67)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)
Leisure & Entertainment (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Ganzfried, Sam, Laughlin, Conner, Morefield, Charles

Parallel Algorithm for Approximating Nash Equilibrium in Multiplayer Stochastic Games with Application to Naval Strategic Planning

Parallel Algorithm for Approximating Nash Equilibrium in Multiplayer Stochastic Games with Application to Naval Strategic Planning Sam Ganzfried 1, Conner Laughlin 2, Charles Morefield 2 1 Ganzfried Research 2 Arctan, Inc. Abstract Many real-world domains contain multiple agents behaving strategically with probabilistic transitions and uncertain (potentially infinite) duration. Such settings can be modeled as stochastic games. While algorithms have been developed for solving (i.e., computing a game-theoretic solution concept such as Nash equilibrium) two-player zero-sum stochastic games, research on algorithms for nonzero-sum and multi-player stochastic games is very limited. We present a new algorithm for these settings, which constitutes the first parallel algorithm for multiplayer stochastic games. We present experimental results on a 4-player stochastic game motivated by a naval strategic planning scenario, showing that our algorithm is able to quickly compute strategies constituting Nash equilibrium up to a very small degree of approximation. Introduction Nash equilibrium has emerged as the most compelling solution concept in multiagent strategic interactions. For two-player zero-sum (adversarial) games, a Nash equilibrium can be computed in polynomial time (e.g., by linear programming). This result holds both for simultaneous-move games (often represented as a matrix), and for sequential games of both perfect and imperfect information (often represented as an extensive-form game tree).

algorithm, artificial intelligence, game theory, (16 more...)

1910.00193

Country:

North America > Canada > Alberta (0.14)
Pacific Ocean > North Pacific Ocean > South China Sea (0.04)
Asia > China (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment > Games > Poker (0.47)
Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Dubey, Rohit K., Sohn, Samuel S., Hoelscher, Christoph, Kapadia, Mubbasir

Cognitive Agent Based Simulation Model For Improving Disaster Response Procedures

In the event of a disaster, saving human lives is of utmost importance. For developing proper evacuation procedures and guidance systems, behavioural data on how people respond during panic and stress is crucial. In the absence of real human data on building evacuation, there is a need for a crowd simulator to model egress and decision-making under uncertainty. In this paper, we propose an agent-based simulation tool, which is grounded in human cognition and decision-making, for evaluating and improving the effectiveness of building evacuation procedures and guidance systems during a disaster. Specifically, we propose a predictive agent-wayfinding framework based on information theory that is applied at intersections with variable route choices where it fuses N dynamic information sources. The proposed framework can be used to visualize trajectories and prediction results (i.e., total evacuation time, number of people evacuated) for different combinations of reinforcing or contradicting information sources (i.e., signage, crowd flow, familiarity, and spatial layout). This tool can enable designers to recreate various disaster scenarios and generate simulation data for improving the evacuation procedures and existing guidance systems.

agent, information, information source, (16 more...)

1910.00767

Country:

Asia > Singapore (0.05)
North America > United States > New Jersey (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
(4 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Relational Graph Learning for Crowd Navigation

Chen, Changan, Hu, Sha, Nikdel, Payam, Mori, Greg, Savva, Manolis

-- We present a relational graph learning approach for robotic crowd navigation using model-based deep reinforcement learning that plans actions by looking into the future. Our approach reasons about the relations between all agents based on their latent features and uses a Graph Convolutional Network to encode higher-order interactions in each agent's state representation, which is subsequently leveraged for state prediction and value estimation. The ability to predict human motion allows us to perform multi-step lookahead planning, taking into account the temporal evolution of human crowds. We evaluate our approach against a state-of-the-art baseline for crowd navigation and ablations of our model to demonstrate that navigation with our approach is more efficient, results in fewer collisions, and avoids failure cases involving oscillatory and freezing behaviors. I. INTRODUCTION Inferring the underlying relations between components of complex dynamic systems can inform decision making for autonomous agents. One natural system with complex dynamics is crowd navigation (i.e., navigation in the presence of multiple humans). The crowd navigation task is challenging as the agent must predict and plan relative to likely human motions so as to avoid collisions and remain at safe and socially appropriate distances from people. Some prior work predicts human trajectories using handcrafted social interaction models [1] or by modeling the temporal behavior of humans [2]. Although these methods can estimate human trajectories, they do not use the prediction to inform the navigation policy.

interaction, navigation, relation, (15 more...)

1909.13165

Genre: Research Report (0.53)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)