Agents
Planning and Synthesis Under Assumptions
Aminof, Benjamin, De Giacomo, Giuseppe, Murano, Aniello, Rubin, Sasha
In Reasoning about Action and Planning, one synthesizes the agent plan by taking advantage of the assumption on how the environment works (that is, one exploits the environment's effects, its fairness, its trajectory constraints). In this paper we study this form of synthesis in detail. We consider assumptions as constraints on the possible strategies that the environment can have in order to respond to the agent's actions. Such constraints may be given in the form of a planning domain (or action theory), as linear-time formulas over infinite or finite runs, or as a combination of the two (e.g., FOND under fairness). We argue though that not all assumption specifications are meaningful: they need to be consistent, which means that there must exist an environment strategy fulfilling the assumption in spite of the agent actions. For such assumptions, we study how to do synthesis/planning for agent goals, ranging from a classical reachability to goal on traces specified in LTL and LTLf/LDLf, characterizing the problem both mathematically and algorithmically.
Payoff Control in the Iterated Prisoner's Dilemma
Repeated game has long been the touchstone model for agents' long-run relationships. Previous results suggest that it is particularly difficult for a repeated game player to exert an autocratic control on the payoffs since they are jointly determined by all participants. This work discovers that the scale of a player's capability to unilaterally influence the payoffs may have been much underestimated. Under the conventional iterated prisoner's dilemma, we develop a general framework for controlling the feasible region where the players' payoff pairs lie. A control strategy player is able to confine the payoff pairs in her objective region, as long as this region has feasible linear boundaries. With this framework, many well-known existing strategies can be categorized and various new strategies with nice properties can be further identified. We show that the control strategies perform well either in a tournament or against a human-like opponent.
Deep Reinforcement Learning for Swarm Systems
Hüttenrauch, Maximilian, Šošić, Adrian, Neumann, Gerhard
Recently, deep reinforcement learning (RL) methods have been applied successfully to multi-agent scenarios. Typically, these methods rely on a concatenation of agent states to represent the information content required for decentralized decision making. However, concatenation scales poorly to swarm systems with a large number of homogeneous agents as it does not exploit the fundamental properties inherent to these systems: (i) the agents in the swarm are interchangeable and (ii) the exact number of agents in the swarm is irrelevant. Therefore, we propose a new state representation for deep multi-agent RL based on mean embeddings of distributions. We treat the agents as samples of a distribution and use the empirical mean embedding as input for a decentralized policy. We define different feature spaces of the mean embedding using histograms, radial basis functions and a neural network learned end-to-end. We evaluate the representation on two well known problems from the swarm literature (rendezvous and pursuit evasion), in a globally and locally observable setup. For the local setup we furthermore introduce simple communication protocols. Of all approaches, the mean embedding representation using neural network features enables the richest information exchange between neighboring agents facilitating the development of more complex collective strategies.
An agent-based model of an endangered population of the Arctic fox from Mednyi Island
Brilliantova, Angelina, Pletenev, Anton, Doronina, Liliya, Hosseini, Hadi
Artificial Intelligence techniques such as agent-based modeling and probabilistic reasoning have shown promise in modeling complex biological systems and testing ecological hypotheses through simulation. We develop an agent-based model of Arctic foxes from Medniy Island while utilizing Probabilistic Graphical Models to capture the conditional dependencies between the random variables. Such models provide valuable insights in analyzing factors behind catastrophic degradation of this population and in revealing evolutionary mechanisms of its persistence in high-density environment. Using empirical data from studies in Medniy Island, we create a realistic model of Arctic foxes as agents, and study their survival and population dynamics under a variety of conditions.
Shielded Decision-Making in MDPs
Jansen, Nils, Könighofer, Bettina, Junges, Sebastian, Bloem, Roderick
Roderick Bloem TU Graz Austria A prominent problem in artificial intelligence and machine learning is the safe exploration of an environment. In particular, reinforcement learning is a wellknown technique to determine optimal policies for complicated dynamic systems, but suffers from the fact that such policies may induce harmful behavior. We present the concept of a shield that forces decision-making to provably adhere to safety requirements with high probability. Our method exploits the inherent uncertainties in scenarios given by Markov decision processes. We present a method to compute probabilities of decision making regarding temporal logic constraints. We use that information to realize a shield that--when applied to a reinforcement learning algorithm--ensures (near-)optimal behavior both for the safety constraints and for the actual learning objective. In our experiments, we show on the arcade game PAC-MAN that the learning efficiency increases as the learning needs orders of magnitude fewer episodes. We show tradeoffs between sufficient progress in exploration of the environment and ensuring strict safety.
Generative Adversarial Imitation from Observation
Torabi, Faraz, Warnell, Garrett, Stone, Peter
Imitation from observation (IfO) is the problem of learning directly from state-only demonstrations without having access to the demonstrator's actions. The lack of action information both distinguishes IfO from most of the literature in imitation learning, and also sets it apart as a method that may enable agents to learn from large set of previously inapplicable resources such as internet videos. In this paper, we propose both a general framework for IfO approaches and propose a new IfO approach based on generative adversarial networks called generative adversarial imitation from observation (GAIfO). We demonstrate that this approach performs comparably to classical imitation learning approaches (which have access to the demonstrator's actions) and significantly outperforms existing imitation from observation methods in high-dimensional simulation environments.
A Mathematical Account of Soft Evidence, and of Jeffrey's `destructive' versus Pearl's `constructive' updating
Evidence in probabilistic reasoning may be `hard' or `soft', that is, it may be of yes/no form, or it may involve a strength of belief, in the unit interval [0,1]. Reasoning with soft, $[0,1]$-valued evidence is important in many situations but may lead to different, confusing interpretations. This paper intends to bring more mathematical clarity to the field by shifting the existing focus from specification of soft evidence to accomodation of soft evidence. There are two main approaches, known as Jeffrey's rule and Pearl's method, which give different outcomes on soft evidence. This paper describes these two approaches as different ways of updating with soft evidence, highlighting their differences, similarities and applications. This account is based on a novel channel-based approach to Bayesian probability. Proper understanding of these two update mechanisms is highly relevant for inference, decision tools and probabilistic programming languages.
Generalization in quasi-periodic environments
Bellettini, Giovanni, Betti, Alessandro, Gori, Marco
By and large the behavior of stochastic gradient is regarded as a challenging problem, and it is often presented in the framework of statistical machine learning. This paper offers a novel view on the analysis of on-line models of learning that arises when dealing with a generalized version of stochastic gradient that is based on dissipative dynamics. In order to face the complex evolution of these models, a systematic treatment is proposed which is based on energy balance equations that are derived by means of the Caldirola-Kanai (CK) Hamiltonian. According to these equations, learning can be regarded as an ordering process which corresponds with the decrement of the loss function. Finally, the main results established in this paper is that in the case of quasi-periodic environments, where the pattern novelty is progressively limited as time goes by, the system dynamics yields an asymptotically consistent solution in the weight space, that is the solution maps similar patterns to the same decision.
Talk the Walk: Navigating New York City through Grounded Dialogue
de Vries, Harm, Shuster, Kurt, Batra, Dhruv, Parikh, Devi, Weston, Jason, Kiela, Douwe
We introduce "Talk The Walk", the first large-scale dialogue dataset grounded in action and perception. The task involves two agents (a "guide" and a "tourist") that communicate via natural language in order to achieve a common goal: having the tourist navigate to a given target location. The task and dataset, which are described in detail, are challenging and their full solution is an open problem that we pose to the community. We (i) focus on the task of tourist localization and develop the novel Masked Attention for Spatial Convolutions (MASC) mechanism that allows for grounding tourist utterances into the guide's map, (ii) show it yields significant improvements for both emergent and natural language communication, and (iii) using this method, we establish non-trivial baselines on the full task.
Forget Killer Robots: Autonomous Weapons Are Already Online
Earlier this year, concerns over the development of autonomous military systems -- essentially AI-driven machinery capable of making battlefield decisions, including the selection of targets -- were once again the center of attention at a United Nations meeting in Geneva. "Where is the line going to be drawn between human and machine decision-making?" Paul Scharre, director of the Technology and National Security Program at the Center for a New American Security in Washington, D.C., told Time magazine. "Are we going to be willing to delegate lethal authority to the machine?" "Malicious computer programs that could be described as'intelligent autonomous agents' are what steal people's data."