Goto

Collaborating Authors

 Agents


Modeling Theory of Mind for Autonomous Agents with Probabilistic Programs

arXiv.org Artificial Intelligence

As autonomous agents become more ubiquitous, they will eventually have to reason about the mental state of other agents, including those agents' beliefs, desires and goals - so-called theory of mind reasoning. We introduce a collection of increasingly complex theory of mind models of a "chaser" pursuing a "runner", known as the Chaser-Runner model. We show that our implementation is a relatively straightforward theory of mind model that can capture a variety of rich behaviors, which in turn, increase runner detection rates relative to basic (non-theory-of-mind) models. In addition, our paper demonstrates that (1) using a planning-as-inference formulation based on nested importance sampling results in agents simultaneously reasoning about other agents' plans and crafting counter-plans, (2) probabilistic programming is a natural way to describe models in which each uses complex primitives such as path planners to make decisions, and (3) allocating additional computation to perform nested reasoning about agents result in lower-variance estimates of expected utility.


Multi-agent Deep Reinforcement Learning with Extremely Noisy Observations

arXiv.org Machine Learning

Multi-agent reinforcement learning systems aim to provide interacting agents with the ability to collaboratively learn and adapt to the behaviour of other agents. In many real-world applications, the agents can only acquire a partial view of the world. Here we consider a setting whereby most agents' observations are also extremely noisy, hence only weakly correlated to the true state of the environment. Under these circumstances, learning an optimal policy becomes particularly challenging, even in the unrealistic case that an agent's policy can be made conditional upon all other agents' observations. To overcome these difficulties, we propose a multi-agent deep deterministic policy gradient algorithm enhanced by a communication medium (MADDPG-M), which implements a two-level, concurrent learning mechanism. An agent's policy depends on its own private observations as well as those explicitly shared by others through a communication medium. At any given point in time, an agent must decide whether its private observations are sufficiently informative to be shared with others. However, our environments provide no explicit feedback informing an agent whether a communication action is beneficial, rather the communication policies must also be learned through experience concurrently to the main policies. Our experimental results demonstrate that the algorithm performs well in six highly non-stationary environments of progressively higher complexity, and offers substantial performance gains compared to the baselines.


That's Mine! Learning Ownership Relations and Norms for Robots

arXiv.org Artificial Intelligence

The ability for autonomous agents to learn and conform to human norms is crucial for their safety and effectiveness in social environments. While recent work has led to frameworks for the representation and inference of simple social rules, research into norm learning remains at an exploratory stage. Here, we present a robotic system capable of representing, learning, and inferring ownership relations and norms. Ownership is represented as a graph of probabilistic relations between objects and their owners, along with a database of predicate-based norms that constrain the actions permissible on owned objects. To learn these norms and relations, our system integrates (i) a novel incremental norm learning algorithm capable of both one-shot learning and induction from specific examples, (ii) Bayesian inference of ownership relations in response to apparent rule violations, and (iii) percept-based prediction of an object's likely owners. Through a series of simulated and real-world experiments, we demonstrate the competence and flexibility of the system in performing object manipulation tasks that require a variety of norms to be followed, laying the groundwork for future research into the acquisition and application of social norms.


Learning Curriculum Policies for Reinforcement Learning

arXiv.org Artificial Intelligence

Curriculum learning in reinforcement learning is a training methodology that seeks to speed up learning of a difficult target task, by first training on a series of simpler tasks and transferring the knowledge acquired to the target task. Automatically choosing a sequence of such tasks (i.e. a curriculum) is an open problem that has been the subject of much recent work in this area. In this paper, we build upon a recent method for curriculum design, which formulates the curriculum sequencing problem as a Markov Decision Process. We extend this model to handle multiple transfer learning algorithms, and show for the first time that a curriculum policy over this MDP can be learned from experience. We explore various representations that make this possible, and evaluate our approach by learning curriculum policies for multiple agents in two different domains. The results show that our method produces curricula that can train agents to perform on a target task as fast or faster than existing methods.


Control with Distributed Deep Reinforcement Learning: Learn a Better Policy

arXiv.org Artificial Intelligence

Abstract: Distributed approach is a very effective method to improve training efficiency of reinforcement learning. In this paper, we propose a new heuristic distributed architecture for deep reinforcement learning (DRL) algorithm, in which a PSO based network update mechanism is adopted to speed up learning an optimal policy besides using multiple agents for parallel training. In this mechanism, the update of neural network of each agent is not only according to the training result of itself, but also affected by the optimal neural network of all agents. In order to verify the effectiveness of the proposed method, the proposed architecture is implemented on the Deep Q-Network algorithm (DQN) and the Deep Deterministic Policy Gradient algorithm (DDPG) to train several typical control problems. The training results show that the proposed method is effective. Reinforcement learning is about an agent interacting with the environment, learning an optimal policy by trial and error.


Using Monte Carlo Tree Search as a Demonstrator within Asynchronous Deep RL

arXiv.org Artificial Intelligence

Deep reinforcement learning (DRL) has achieved great successes in recent years with the help of novel methods and higher compute power. However, there are still several challenges to be addressed such as convergence to locally optimal policies and long training times. In this paper, firstly, we augment Asynchronous Advantage Actor-Critic (A3C) method with a novel self-supervised auxiliary task, i.e. \emph{Terminal Prediction}, measuring temporal closeness to terminal states, namely A3C-TP. Secondly, we propose a new framework where planning algorithms such as Monte Carlo tree search or other sources of (simulated) demonstrators can be integrated to asynchronous distributed DRL methods. Compared to vanilla A3C, our proposed methods both learn faster and converge to better policies on a two-player mini version of the Pommerman game.


Clear the Fog: Combat Value Assessment in Incomplete Information Games with Convolutional Encoder-Decoders

arXiv.org Artificial Intelligence

StarCraft, one of the most popular real-time strategy games, is a compelling environment for artificial intelligence research for both micro-level unit control and macro-level strategic decision making. In this study, we address an eminent problem concerning macro-level decision making, known as the 'fog-of-war', which rises naturally from the fact that information regarding the opponent's state is always provided in the incomplete form. For intelligent agents to play like human players, it is obvious that making accurate predictions of the opponent's status under incomplete information will increase its chance of winning. To reflect this fact, we propose a convolutional encoder-decoder architecture that predicts potential counts and locations of the opponent's units based on only partially visible and noisy information. To evaluate the performance of our proposed method, we train an additional classifier on the encoder-decoder output to predict the game outcome (win or lose). Finally, we designed an agent incorporating the proposed method and conducted simulation games against rule-based agents to demonstrate both effectiveness and practicality. All experiments were conducted on actual game replay data acquired from professional players.


Be careful what you write! Customer service agents see what you're typing BEFORE you press send

Daily Mail - Science & tech

A worrying feature in customer service chats has been discovered and it has many users concerned about their privacy. A growing number of live chat services, which are often used to connect customer service representatives with users in need of help, have been found to be equipped with'real-time typing view,' according to Gizmodo. This lets customer service representatives see what you're typing even before you send it. A number of live chat services, often used to connect customer service representatives with users in need of help, have been found to be equipped with'real-time typing view' While many claim the feature is meant to help customer service reps prepare an answer to your question ahead of time, it's unclear if users are aware of the tool. The issue came to light after Gizmodo received a screenshot from a reader, wherein they bluntly asked a representative in a live chat whether they could see their messages before they were sent.


Data-driven Conceptual Spaces: Creating Semantic Representations For Linguistic Descriptions Of Numerical Data

Journal of Artificial Intelligence Research

There is an increasing need to derive semantics from real-world observations to facilitate natural information sharing between machine and human. Conceptual spaces theory is a possible approach and has been proposed as mid-level representation between symbolic and sub-symbolic representations, whereby concepts are represented in a geometrical space that is characterised by a number of quality dimensions. Currently, much of the work has demonstrated how conceptual spaces are created in a knowledge-driven manner, relying on prior knowledge to form concepts and identify quality dimensions. This paper presents a method to create semantic representations using data-driven conceptual spaces which are then used to derive linguistic descriptions of numerical data. Our contribution is a principled approach to automatically construct a conceptual space from a set of known observations wherein the quality dimensions and domains are not known a priori. This novelty of the approach is the ability to select and group semantic features to discriminate between concepts in a data-driven manner while preserving the semantic interpretation that is needed to infer linguistic descriptions for interaction with humans. Two data sets representing leaf images and time series signals are used to evaluate the method. An empirical evaluation for each case study assesses how well linguistic descriptions generated from the conceptual spaces identify unknown observations. Furthermore, comparisons are made with descriptions derived on alternative approaches for generating semantic models.


Analyzing Federated Learning through an Adversarial Lens

arXiv.org Artificial Intelligence

Federated learning distributes model training among a multitude of agents, who, guided by privacy concerns, perform training using their local data but share only model parameter updates, for iterative aggregation at the server. In this work, we explore the threat of model poisoning attacks on federated learning initiated by a single, non-colluding malicious agent where the adversarial objective is to cause the model to misclassify a set of chosen inputs with high confidence. We explore a number of strategies to carry out this attack, starting with simple boosting of the malicious agent's update to overcome the effects of other agents' updates. To increase attack stealth, we propose an alternating minimization strategy, which alternately optimizes for the training loss and the adversarial objective. We follow up by using parameter estimation for the benign agents' updates to improve on attack success. Finally, we use a suite of interpretability techniques to generate visual explanations of model decisions for both benign and malicious models and show that the explanations are nearly visually indistinguishable. Our results indicate that even a highly constrained adversary can carry out model poisoning attacks while simultaneously maintaining stealth, thus highlighting the vulnerability of the federated learning setting and the need to develop effective defense strategies.