Agents
Re-determinizing Information Set Monte Carlo Tree Search in Hanabi
This technical report documents the winner of the Computational Intelligence in Games(CIG) 2018 Hanabi competition. We introduce Re-determinizing IS-MCTS, a novel extension of Information Set Monte Carlo Tree Search (IS-MCTS) \cite{IS-MCTS} that prevents a leakage of hidden information into opponent models that can occur in IS-MCTS, and is particularly severe in Hanabi. Re-determinizing IS-MCTS scores higher in Hanabi for 2-4 players than previously published work. Given the 40ms competition time limit per move we use a learned evaluation function to estimate leaf node values and avoid full simulations during MCTS. For the Mixed track competition, in which the identity of the other players is unknown, a simple Bayesian opponent model is used that is updated as each game proceeds.
Google open-sources PlaNet, an AI agent that learns about the world from images
But it's not always practical; model-free approaches, which aim to get agents to directly predict actions from observations about their world, can take weeks of training. Model-based reinforcement learning is a viable alternative -- it has agents come up with a general model of their environment they can use to plan ahead. But in order to accurately forecast actions in unfamiliar surroundings, those agents have to formulate rules from experience. Toward that end, Google in collaboration with DeepMind today introduced the Deep Planning Network (PlaNet) agent, which learns a world model from image inputs and leverages it for planning. It's able to solve a variety of image-based tasks with up to 5,000 percent the data efficiency, Google says, while maintaining competitiveness with advanced model-free agents.
Privacy of Existence of Secrets: Introducing Steganographic DCOPs and Revisiting DCOP Frameworks
Silaghi, Viorel D., Silaghi, Marius C., Mandiau, Renรฉ
Here we identify a type of privacy concern in Distributed Constraint Optimization (DCOPs) not previously addressed in literature, despite its importance and impact on the application field: the privacy of existence of secrets. Science only starts where metrics and assumptions are clearly defined. The area of Distributed Constraint Optimization has emerged at the intersection of the multi-agent system community and constraint programming. For the multi-agent community, the constraint optimization problems are an elegant way to express many of the problems occurring in trading and distributed robotics. For the theoretical constraint programming community the DCOPs are a natural extension of their main object of study, the constraint satisfaction problem. As such, the understanding of the DCOP framework has been refined with the needs of the two communities, but sometimes without spelling the new assumptions formally and therefore making it difficult to compare techniques. Here we give a direction to the efforts for structuring concepts in this area.
Active Perception in Adversarial Scenarios using Maximum Entropy Deep Reinforcement Learning
Shen, Macheng, How, Jonathan P
We pose an active perception problem where an autonomous agent actively interacts with a second agent with potentially adversarial behaviors. Given the uncertainty in the intent of the other agent, the objective is to collect further evidence to help discriminate potential threats. The main technical challenges are the partial observability of the agent intent, the adversary modeling, and the corresponding uncertainty modeling. Note that an adversary agent may act to mislead the autonomous agent by using a deceptive strategy that is learned from past experiences. We propose an approach that combines belief space planning, generative adversary modeling, and maximum entropy reinforcement learning to obtain a stochastic belief space policy. By accounting for various adversarial behaviors in the simulation framework and minimizing the predictability of the autonomous agent's action, the resulting policy is more robust to unmodeled adversarial strategies. This improved robustness is empirically shown against an adversary that adapts to and exploits the autonomous agent's policy when compared with a standard Chance-Constraint Partially Observable Markov Decision Process robust approach.
Probabilistic Relational Agent-based Models
In agent-based models (ABMs, e.g., [4, 3]) agents probabilistically change state. State can be represented as attribute values such as health status, monthly income, age, political orientation, location and so on. A population of agents has a joint state that is typically a joint distribution; for example, a population has a joint distribution over income levels and political beliefs. ABMs are a popular method for exploring the dynamics of joint states, which can be hard to estimate when attribute values depend on each other, and populations are heterogeneous in the sense that not everyone has the same distribution of attribute values, and the principal mechanism for changing attribute values is interactions between agents. For example, suppose all agents have a flu status attribute that changes conditionally - given other attributes such as age, income, and vaccination status - when agents interact. The dynamics of flu - how it moves through heterogeneous populations - can be difficult or impossible to solve, but ABMs can simulate the interactions of agents, and the flu status of these agents can be tracked over time. ABMs are no doubt engines of probabilistic inference, but it is difficult to say anything about the models that underlie the inference. This paper presents pram - Probabilistic Relational Agentbased Models - a new kind of ABM with design influences from compartmental models (e.g., [1]) and probabilistic relational models (PRMs; e.g., [2]).
NAIL: A General Interactive Fiction Agent
Hausknecht, Matthew, Loynd, Ricky, Yang, Greg, Swaminathan, Adith, Williams, Jason D.
Interactive Fiction (IF) games are complex textual decision making problems. This paper introduces NAIL, an autonomous agent for general parser-based IF games. NAIL won the 2018 Text Adventure AI Competition, where it was evaluated on twenty unseen games. This paper describes the architecture, development, and insights underpinning NAIL's performance.
Learning to Control Self-Assembling Morphologies: A Study of Generalization via Modularity
Pathak, Deepak, Lu, Chris, Darrell, Trevor, Isola, Phillip, Efros, Alexei A.
Contemporary sensorimotor learning approaches typically start with an existing complex agent (e.g., a robotic arm), which they learn to control. In contrast, this paper investigates a modular co-evolution strategy: a collection of primitive agents learns to dynamically self-assemble into composite bodies while also learning to coordinate their behavior to control these bodies. Each primitive agent consists of a limb with a motor attached at one end. Limbs may choose to link up to form collectives. When a limb initiates a link-up action and there is another limb nearby, the latter is magnetically connected to the 'parent' limb's motor. This forms a new single agent, which may further link with other agents. In this way, complex morphologies can emerge, controlled by a policy whose architecture is in explicit correspondence with the morphology. We evaluate the performance of these 'dynamic' and 'modular' agents in simulated environments. We demonstrate better generalization to test-time changes both in the environment, as well as in the agent morphology, compared to static and monolithic baselines. Project videos and code are available at https://pathak22.github.io/modular-assemblies/
Weighted Tensor Completion for Time-Series Causal Inference
Mandal, Debmalya, Parkes, David
Marginal Structural Models (MSM) {Robins, 2000} are the most popular models for causal inference from time-series observational data. However, they have two main drawbacks: (a) they do not capture subject heterogeneity, and (b) they only consider fixed time intervals and do not scale gracefully with longer intervals. In this work, we propose a new family of MSMs to address these two concerns. We model the potential outcomes as a three-dimensional tensor of low rank, where the three dimensions correspond to the agents, time periods and the set of possible histories. Unlike the traditional MSM, we allow the dimensions of the tensor to increase with the number of agents and time periods. We set up a weighted tensor completion problem as our estimation procedure, and show that the solution to this problem converges to the true model in an appropriate sense. Then we show how to solve the estimation problem, providing conditions under which we can approximately and efficiently solve the estimation problem. Finally, we propose an algorithm based on projected gradient descent, which is easy to implement and evaluate its performance on a simulated dataset.
Understanding The Impact of Partner Choice on Cooperation and Social Norms by means of Multi-agent Reinforcement Learning
Anastassacos, Nicolas, Hailes, Steve, Musolesi, Mirco
The human ability to coordinate and cooperate has been vital to the development of societies for thousands of years. While it is not fully clear how this behavior arises, social norms are thought to be a key factor in this development. In contrast to laws set by authorities, norms tend to evolve in a bottom-up manner from interactions between members of a society. While much behavior can be explained through the use of social norms, it is difficult to measure the extent to which they shape society as well as how they are affected by other societal dynamics. In this paper, we discuss the design and evaluation of a reinforcement learning model for understanding how the opportunity to choose who you interact with in a society affects the overall societal outcome and the strength of social norms. We first study the emergence of norms and then the emergence of cooperation in presence of norms. In our model, agents interact with other agents in a society in the form of repeated matrix-games: coordination games and cooperation games. In particular, in our model, at each each stage, agents are either able to choose a partner to interact with or are forced to interact at random and learn using policy gradients.
eugenevinitsky/sequential_social_dilemma_games
This repo is an open-source implementation of DeepMind's Sequential Social Dilemma (SSD) multi-agent game-theoretic environments [1]. SSDs can be thought of as analogous to spatially and temporally extended Prisoner's Dilemma-like games. The reward structure poses a dilemma because individual short-term optimal strategies lead to poor long-term outcomes for the group. The implemented environments are structured to be compatible with OpenAIs gym environments (https://github.com/openai/gym) The above plot shows the empirical Schelling diagrams for both Cleanup (A) and Harvest (B) (from [2]).