AITopics | Agents

Collaborating Authors

Agents

News Overviews Instructional Materials AI-Alerts Classics

Modelling the Dynamic Joint Policy of Teammates with Attention Multi-agent DDPG

Mao, Hangyu, Zhang, Zhengchao, Xiao, Zhen, Gong, Zhibo

arXiv.org Artificial IntelligenceNov-13-2018

Modelling and exploiting teammates' policies in cooperative multi-agent systems have long been an interest and also a big challenge for the reinforcement learning (RL) community. The interest lies in the fact that if the agent knows the teammates' policies, it can adjust its own policy accordingly to arrive at proper cooperations; while the challenge is that the agents' policies are changing continuously due to they are learning concurrently, which imposes difficulty to model the dynamic policies of teammates accurately. In this paper, we present \emph{ATTention Multi-Agent Deep Deterministic Policy Gradient} (ATT-MADDPG) to address this challenge. ATT-MADDPG extends DDPG, a single-agent actor-critic RL method, with two special designs. First, in order to model the teammates' policies, the agent should get access to the observations and actions of teammates. ATT-MADDPG adopts a centralized critic to collect such information. Second, to model the teammates' policies using the collected information in an effective way, ATT-MADDPG enhances the centralized critic with an attention mechanism. This attention mechanism introduces a special structure to explicitly model the dynamic joint policy of teammates, making sure that the collected information can be processed efficiently. We evaluate ATT-MADDPG on both benchmark tasks and the real-world packet routing tasks. Experimental results show that it not only outperforms the state-of-the-art RL-based methods and rule-based methods by a large margin, but also achieves better performance in terms of scalability and robustness.

agent, artificial intelligence, teammate, (15 more...)

arXiv.org Artificial Intelligence

1811.07029

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment > Games (0.46)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.34)

Add feedback

Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents

Siddhant, Aditya, Goyal, Anuj, Metallinou, Angeliki

arXiv.org Artificial IntelligenceNov-13-2018

User interaction with voice-powered agents generates large amounts of unlabeled utterances. In this paper, we explore techniques to efficiently transfer the knowledge from these unlabeled utterances to improve model performance on Spoken Language Understanding (SLU) tasks. We use Embeddings from Language Model (ELMo) to take advantage of unlabeled data by learning contextualized word representations. Additionally, we propose ELMo-Light (ELMoL), a faster and simpler unsupervised pre-training method for SLU. Our findings suggest unsupervised pre-training on a large corpora of unlabeled utterances leads to significantly better SLU performance compared to training from scratch and it can even outperform conventional supervised transfer. Additionally, we show that the gains from unsupervised transfer techniques can be further improved by supervised transfer. The improvements are more pronounced in low resource settings and when using only 1000 labeled in-domain samples, our techniques match the performance of training from scratch on 10-15x more labeled in-domain data.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

1811.0537

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.68)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Identifying Sources and Sinks in the Presence of Multiple Agents with Gaussian Process Vector Calculus

Cobb, Adam D., Everett, Richard, Markham, Andrew, Roberts, Stephen J.

arXiv.org Artificial IntelligenceNov-12-2018

In systems of multiple agents, identifying the cause of observed agent dynamics is challenging. Often, these agents operate in diverse, non-stationary environments, where models rely on hand-crafted environment-specific features to infer influential regions in the system's surroundings. To overcome the limitations of these inflexible models, we present GP-LAPLACE, a technique for locating sources and sinks from trajectories in time-varying fields. Using Gaussian processes, we jointly infer a spatio-temporal vector field, as well as canonical vector calculus operations on that field. Notably, we do this from only agent trajectories without requiring knowledge of the environment, and also obtain a metric for denoting the significance of inferred causal features in the environment by exploiting our probabilistic method. To evaluate our approach, we apply it to both synthetic and real-world GPS data, demonstrating the applicability of our technique in the presence of multiple agents, as well as its superiority over existing methods.

artificial intelligence, potential function, trajectory, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3219819.3220065

1802.10446

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.15)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Agent Embeddings: A Latent Representation for Pole-Balancing Networks

Chang, Oscar, Kwiatkowski, Robert, Chen, Siyuan, Lipson, Hod

arXiv.org Artificial IntelligenceNov-11-2018

We show that it is possible to reduce a high-dimensional object like a neural network agent into a low-dimensional vector representation with semantic meaning that we call agent embeddings, akin to word or face embeddings. This can be done by collecting examples of existing networks, vectorizing their weights, and then learning a generative model over the weight space in a supervised fashion. We investigate a pole-balancing task, Cart-Pole, as a case study and show that multiple new pole-balancing networks can be generated from their agent embeddings without direct access to training data from the Cart-Pole simulator. In general, the learned embedding space is helpful for mapping out the space of solutions for a given task. We observe in the case of Cart-Pole the surprising finding that good agents make different decisions despite learning similar representations, whereas bad agents make similar (bad) decisions while learning dissimilar representations. Linearly interpolating between the latent embeddings for a good agent and a bad agent yields an agent embedding that generates a network with intermediate performance, where the performance can be tuned according to the coefficient of interpolation. Linear extrapolation in the latent space also results in performance boosts, up to a point.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

1811.04516

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Playing by the Book: Towards Agent-based Narrative Understanding through Role-playing and Simulation

Tamari, Ronen, Shindo, Hiroyuki, Shahaf, Dafna, Matsumoto, Yuji

arXiv.org Machine LearningNov-10-2018

Understanding procedural text requires tracking entities, actions and effects as the narrative unfolds (often implicitly). We focus on the challenging real-world problem of structured narrative extraction in the materials science domain, where language is highly specialized and suitable annotated data is not publicly available. We propose an approach, Text2Quest, where procedural text is interpreted as instructions for an interactive game. A reinforcement-learning agent completes the game by understanding and executing the procedure correctly, in a text-based simulated lab environment. The framework is intended to be more broadly applicable to other domain-specific and data-scarce settings. We conclude with a discussion of challenges and interesting potential extensions enabled by the agent-based perspective.

machine learning, natural language, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1811.04319

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.71)
(2 more...)

Add feedback

The Price of Governance: A Middle Ground Solution to Coordination in Organizational Control

Yu, Chao

arXiv.org Artificial IntelligenceNov-9-2018

Achieving coordination is crucial in organizational control. This paper investigates a middle ground solution between decentralized interactions and centralized administrations for coordinating agents beyond inefficient behavior. We first propose the price of governance (PoG) to evaluate how such a middle ground solution performs in terms of effectiveness and cost. We then propose a hierarchical supervision framework to explicitly model the PoG, and define step by step how to realize the core principle of the framework and compute the optimal PoG for a control problem. Two illustrative case studies are carried out to exemplify the applications of the proposed framework and its methodology. Results show that by properly formulating and implementing each step, the hierarchical supervision framework is capable of promoting coordination among agents while bounding administrative cost to a minimum in different kinds of organizational control problems.

agent, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

1811.03819

Country:

Asia (0.28)
North America > United States (0.15)

Genre: Research Report (1.00)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Game Theory (0.94)
Information Technology > Communications > Networks (0.68)

Add feedback

Learning from Demonstration in the Wild

Behbahani, Feryal, Shiarlis, Kyriacos, Chen, Xi, Kurin, Vitaly, Kasewa, Sudhanshu, Stirbu, Ciprian, Gomes, João, Paul, Supratik, Oliehoek, Frans A., Messias, João, Whiteson, Shimon

arXiv.org Machine LearningNov-8-2018

Abstract-- Learning from demonstration (LfD) is useful in settings where hand-coding behaviour or a reward function is impractical. It has succeeded in a wide range of problems but typically relies on artificially generated demonstrations or specially deployed sensors and has not generally been able to leverage the copious demonstrations available in the wild: those that capture behaviour that was occurring anyway using sensors that were already deployed for another purpose, e.g., traffic camera footage capturing demonstrations of natural behaviour of vehicles, cyclists, and pedestrians. We propose video to behaviour (ViBe), a new approach to learning models of road user behaviour that requires as input only unlabelled raw video data of a traffic scene collected from a single, monocular, uncalibrated camera with ordinary resolution. Our approach calibrates the camera, detects relevant objects, tracks them through time, and uses the resulting trajectories to perform LfD, yielding models of naturalistic behaviour. We apply ViBe to raw videos of a traffic intersection and show that it can learn purely from videos, without additional expert knowledge. Learning from demonstration (LfD) is a machine learning technique that can learn complex behaviours from a dataset of expert trajectories, called demonstrations. LfD is particularly useful in settings where hand-coding behaviour, or engineering a suitable reward function, is too difficult or labour intensive. While LfD has succeeded in a wide range of problems [1], [2], [3], nearly all methods rely on either artificially generated demonstrations (e.g., from laboratory subjects) or those collected by specially deployed sensors (e.g., MOCAP). These restrictions greatly limit the practical applicability of LfD, which to date has largely not been able to leverage the copious demonstrations available in the wild: those that capture behaviour that was occurring anyway using sensors that were already deployed for other purposes. For example, consider the problem of training autonomous vehicles to navigate in the presence of human road users.

machine learning, reinforcement learning, trajectory, (15 more...)

arXiv.org Machine Learning

1811.03516

Country:

Europe > United Kingdom > England (0.28)
North America > United States (0.28)

Genre: Research Report (0.82)

Industry:

Transportation > Ground > Road (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)

Add feedback

Analysis of Fleet Modularity in an Artificial Intelligence-Based Attacker-Defender Game

Li, Xingyu, Epureanu, Bogdan I.

arXiv.org Artificial IntelligenceNov-8-2018

Because combat environments change over time and technology upgrades are widespread for ground vehicles, a large number of vehicles and equipment become quickly obsolete. A possible solution for the U.S. Army is to develop fleets of modular military vehicles, which are built by interchangeable substantial components also known as modules. One of the typical characteristics of module is their ease of assembly and disassembly through simple means such as plug-in/pull-out actions, which allows for real-time fleet reconfiguration to meet dynamic demands. Moreover, military demands are time-varying and highly stochastic because commanders keep reacting to enemy's actions. To capture these characteristics, we formulated an intelligent agent-based model to imitate decision making process during fleet operation, which combines real-time optimization with artificial intelligence. The agents are capable of inferring enemy's future move based on historical data and optimize dispatch/operation decisions accordingly. We implement our model to simulate an attacker-defender game between two adversarial and intelligent players, representing the commanders from modularized fleet and conventional fleet respectively. Given the same level of combat resources and intelligence, we highlight the tactical advantages of fleet modularity in terms of win rate, unpredictability and suffered damage.

artificial intelligence, machine learning, vehicle, (17 more...)

arXiv.org Artificial Intelligence

1811.03742

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Industry:

Government > Military > Army (0.88)
Government > Regional Government > North America Government > United States Government (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Unveiling Swarm Intelligence with Network Science$-$the Metaphor Explained

Oliveira, Marcos, Pinheiro, Diego, Macedo, Mariana, Bastos-Filho, Carmelo, Menezes, Ronaldo

arXiv.org Artificial IntelligenceNov-8-2018

Self-organization is a natural phenomenon that emerges in systems with a large number of interacting components. Self-organized systems show robustness, scalability, and flexibility, which are essential properties when handling real-world problems. Swarm intelligence seeks to design nature-inspired algorithms with a high degree of self-organization. Yet, we do not know why swarm-based algorithms work well and neither we can compare the different approaches in the literature. The lack of a common framework capable of characterizing these several swarm-based algorithms, transcending their particularities, has led to a stream of publications inspired by different aspects of nature without much regard as to whether they are similar to already existing approaches. We address this gap by introducing a network-based framework$-$the interaction network$-$to examine computational swarm-based systems via the optics of social dynamics. We discuss the social dimension of several swarm classes and provide a case study of the Particle Swarm Optimization. The interaction network enables a better understanding of the plethora of approaches currently available by looking at them from a general perspective focusing on the structure of the social interactions.

artificial intelligence, evolutionary algorithm, machine learning, (16 more...)

arXiv.org Artificial Intelligence

1811.03539

Country:

North America > United States (0.93)
Europe > United Kingdom > England > Devon (0.28)

Genre: Research Report (1.00)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Add feedback

Rotational Diversity in Multi-Cycle Assignment Problems

Spieker, Helge, Gotlieb, Arnaud, Mossige, Morten

arXiv.org Artificial IntelligenceNov-8-2018

In multi-cycle assignment problems with rotational diversity, a set of tasks has to be repeatedly assigned to a set of agents. Over multiple cycles, the goal is to achieve a high diversity of assignments from tasks to agents. At the same time, the assignments' profit has to be maximized in each cycle. Due to changing availability of tasks and agents, planning ahead is infeasible and each cycle is an independent assignment problem but influenced by previous choices. We approach the multi-cycle assignment problem as a two-part problem: Profit maximization and rotation are combined into one objective value, and then solved as a General Assignment Problem. Rotational diversity is maintained with a single execution of the costly assignment model. Our simple, yet effective method is applicable to different domains and applications. Experiments show the applicability on a multi-cycle variant of the multiple knapsack problem and a real-world case study on the test case selection and assignment problem, an example from the software engineering domain, where test cases have to be distributed over compatible test machines.

agent, artificial intelligence, optimization problem, (15 more...)

arXiv.org Artificial Intelligence

1811.03496

Country: Europe > Norway (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.35)

Add feedback