Agents
Multi-agent Inverse Reinforcement Learning for General-sum Stochastic Games
Lin, Xiaomin, Adams, Stephen C., Beling, Peter A.
This paper addresses the problem of multi-agent inverse reinforcement learning (MIRL) in a two-player general-sum stochastic game framework. Five variants of MIRL are considered: uCS-MIRL, advE-MIRL, cooE-MIRL, uCE-MIRL, and uNE-MIRL, each distinguished by its solution concept. Problem uCS-MIRL is a cooperative game in which the agents employ cooperative strategies that aim to maximize the total game value. In problem uCE-MIRL, agents are assumed to follow strategies that constitute a correlated equilibrium while maximizing total game value. Problem uNE-MIRL is similar to uCE-MIRL in total game value maximization, but it is assumed that the agents are playing a Nash equilibrium. Problems advE-MIRL and cooE-MIRL assume agents are playing an adversarial equilibrium and a coordination equilibrium, respectively. We propose novel approaches to address these five problems under the assumption that the game observer either knows or is able to accurate estimate the policies and solution concepts for players. For uCS-MIRL, we first develop a characteristic set of solutions ensuring that the observed bi-policy is a uCS and then apply a Bayesian inverse learning method. For uCE-MIRL, we develop a linear programming problem subject to constraints that define necessary and sufficient conditions for the observed policies to be correlated equilibria. The objective is to choose a solution that not only minimizes the total game value difference between the observed bi-policy and a local uCS, but also maximizes the scale of the solution. We apply a similar treatment to the problem of uNE-MIRL. The remaining two problems can be solved efficiently by taking advantage of solution uniqueness and setting up a convex optimization problem. Results are validated on various benchmark grid-world games.
Learning Social Conventions in Markov Games
Lerer, Adam, Peysakhovich, Alexander
Social conventions - arbitrary ways to organize group behavior - are an important part of social life. Any agent that wants to enter an existing society must be able to learn its conventions (e.g. which side of the road to drive on, which language to speak) from relatively few observations or risk being unable to coordinate with everyone else. We consider the game theoretic framework of David Lewis which views the selection of a social convention as the selection of an equilibrium in a coordination game. We ask how to construct reinforcement learning based agents that can solve the convention learning task in the self-play paradigm: at training time the agent has access to a good model of the environment and a small amount of observations about how individuals in society act. The agent then has to construct a policy that is compatible with the test-time social convention. We study three environments from the literature which have multiple conventions: traffic, communication, and risky coordination. In each of these we observe that adding a small amount of imitation learning during self-play training greatly increases the probability that the strategy found by self-play fits well with the social convention the agent will face at test time. We show that this works even in an environment where standard independent multi-agent RL very rarely finds the correct test-time equilibrium.
Finding Optimal Solutions to Token Swapping by Conflict-based Search and Reduction to SAT
We study practical approaches to solving the token swapping (TSWAP) problem optimally in this short paper. In TSWAP, we are given an undirected graph with colored vertices. A colored token is placed in each vertex. A pair of tokens can be swapped between adjacent vertices. The goal is to perform a sequence of swaps so that token and vertex colors agree across the graph. The minimum number of swaps is required in the optimization variant of the problem. We observed similarities between the TSWAP problem and multi-agent path finding (MAPF) where instead of tokens we have multiple agents that need to be moved from their current vertices to given unique target vertices. The difference between both problems consists in local conditions that state transitions (swaps/moves) must satisfy. We developed two algorithms for solving TSWAP optimally by adapting two different approaches to MAPF - CBS and MDD- SAT. This constitutes the first attempt to design optimal solving algorithms for TSWAP. Experimental evaluation on various types of graphs shows that the reduction to SAT scales better than CBS in optimal TSWAP solving.
The Temporal Singularity: time-accelerated simulated civilizations and their implications
Provided significant future progress in artificial intelligence and computing, it may ultimately be possible to create multiple Artificial General Intelligences (AGIs), and possibly entire societies living within simulated environments. In that case, it should be possible to improve the problem solving capabilities of the system by increasing the speed of the simulation. If a minimal simulation with sufficient capabilities is created, it might manage to increase its own speed by accelerating progress in science and technology, in a way similar to the Technological Singularity. This may ultimately lead to large simulated civilizations unfolding at extreme temporal speedups, achieving what from the outside would look like a Temporal Singularity. Here we discuss the feasibility of the minimal simulation and the potential advantages, dangers, and connection to the Fermi paradox of the Temporal Singularity. The medium-term importance of the topic derives from the amount of computational power required to start the process, which could be available within the next decades, making the Temporal Singularity theoretically possible before the end of the century.
Game AI Research with Fast Planet Wars Variants
This paper describes a new implementation of Planet Wars, designed from the outset for Game AI research. The skill-depth of the game makes it a challenge for game-playing agents, and the speed of more than 1 million game ticks per second enables rapid experimentation and prototyping. The parameterised nature of the game together with an interchangeable actuator model make it well suited to automated game tuning. The game is designed to be fun to play for humans, and is directly playable by General Video Game AI agents.
Who will win the AI race? If countries work together, then the answer could be all of us
With global cooperation, we can effectively take on the truth we all acknowledge: perhaps more than previous technological breakthroughs in human history, AI brings challenges along with its enormous potential for good. Yet based on our conversations with government officials, academics, entrepreneurs, journalists and other stakeholders over the past year or two, we at Malong Technologies are unabashedly hopeful. We see a broadening consensus and a willingness to address issues together with a sense of shared responsibility. We see people from all walks of life giving serious thought to the roles they can play and the contributions they can make.
Microsoft to acquire Bonsai in move to build 'brains' for autonomous systems - The Official Microsoft Blog
With AI's meteoric rise, autonomous systems have been projected to grow to more than 800 million in operation by 2025. However, while envisioned in science fiction for a long time, truly intelligent autonomous systems are still elusive and remain a holy grail. The reality today is that training autonomous systems that function amidst the many unforeseen situations in the real world is very hard and requires deep expertise in AI -- essentially making it unscalable. To achieve this inflection point in AI's growth, traditional machine learning methodologies aren't enough. Bringing intelligence to autonomous systems at scale will require a unique combination of the new practice of machine teaching, advances in deep reinforcement learning and leveraging simulation for training.
How techies would design an AI agent for support at work
I'm recently back from ServiceNow's Knowledge 18 conference, where teammates and I found a way to break up the isolation of booth detail: we walked the expo floor to poll people about using work tools powered by artificial intelligence (AI). The poll results offer a sliver of insight about how today's tech workforce views AI technology. Sure, given the nature of the event, one may suggest the crowd was biased. Anticipating that would be the case, we steered the dialogue toward design. To be clear, we polled a total of 70 people.
Expanding the Active Inference Landscape: More Intrinsic Motivations in the Perception-Action Loop
Biehl, Martin, Guckelsberger, Christian, Salge, Christoph, Smith, Simón C., Polani, Daniel
Active inference is an ambitious theory that treats perception, inference and action selection of autonomous agents under the heading of a single principle. It suggests biologically plausible explanations for many cognitive phenomena, including consciousness. In active inference, action selection is driven by an objective function that evaluates possible future actions with respect to current, inferred beliefs about the world. Active inference at its core is independent from extrinsic rewards, resulting in a high level of robustness across e.g.\ different environments or agent morphologies. In the literature, paradigms that share this independence have been summarised under the notion of intrinsic motivations. In general and in contrast to active inference, these models of motivation come without a commitment to particular inference and action selection mechanisms. In this article, we study if the inference and action selection machinery of active inference can also be used by alternatives to the originally included intrinsic motivation. The perception-action loop explicitly relates inference and action selection to the environment and agent memory, and is consequently used as foundation for our analysis. We reconstruct the active inference approach, locate the original formulation within, and show how alternative intrinsic motivations can be used while keeping many of the original features intact. Furthermore, we illustrate the connection to universal reinforcement learning by means of our formalism. Active inference research may profit from comparisons of the dynamics induced by alternative intrinsic motivations. Research on intrinsic motivations may profit from an additional way to implement intrinsically motivated agents that also share the biological plausibility of active inference.
Solving Multi-agent Path Finding on Strongly Biconnected Digraphs
Botea, Adi, Bonusi, Davide, Surynek, Pavel
Much of the literature on suboptimal, polynomial-time algorithms for multi-agent path finding focuses on undirected graphs, where motion is permitted in both directions along a graph edge. Despite this, traveling on directed graphs is relevant in navigation domains, such as path finding in games, and asymmetric communication networks.We consider multi-agent path finding on strongly biconnected directed graphs. We show that all instances with at least two unoccupied positions have a solution, except for a particular, degenerate subclass where the graph has a cyclic shape. We present diBOX, an algorithm for multi-agent path finding on strongly biconnected directed graphs. diBOX runs in polynomial time, computes suboptimal solutions and is complete for instances on strongly biconnected digraphs with at least two unoccupied positions. We theoretically analyze properties of the algorithm and properties of strongly biconnected directed graphs that are relevant to our approach. We perform a detailed empirical analysis of diBOX, showing a good scalability. To our knowledge, our work is the first study of multi-agent path finding focused on directed graphs.