Goto

Collaborating Authors

 Agents


Automated Gaming Pommerman: FFA

arXiv.org Artificial Intelligence

These MARL Reinforcement Learning is a subset of Machine Learning by models use training policy coordination mechanisms to which agents can become proficient in functioning by itself in minimize the impact of the training tasks.


AI smokes 5 poker champs at a time in no-limit Hold'em with 'relentless consistency' – TechCrunch

#artificialintelligence

The machines have proven their superiority in one-on-one games like chess and go, and even poker -- but in complex multiplayer versions of the card game humans have retained their edge… until now. An evolution of the last AI agent to flummox poker pros individually is now decisively beating them in championship-style 6-person game. As documented in a paper published in the journal Science today, the CMU/Facebook collaboration they call Pluribus reliably beats five professional poker players in the same game, or one pro pitted against five independent copies of itself. It's a major leap forward in capability for the machines, and amazingly is also far more efficient than previous agents as well. One-on-one poker is a weird game, and not a simple one, but the zero-sum nature of it (whatever you lose, the other player gets) makes it susceptible to certain strategies in which computer able to calculate out far enough can put itself at an advantage.


From Observability to Significance in Distributed Information Systems

arXiv.org Artificial Intelligence

To understand and explain process behaviour we need to be able to see it, and decide its significance, i.e. be able to tell a story about its behaviours. This paper describes a few of the modelling challenges that underlie monitoring and observation of processes in IT, by human or by software. The topic of the observability of systems has been elevated recently in connection with computer monitoring and tracing of processes for debugging and forensics. It raises the issue of well-known principles of measurement, in bounded contexts, but these issues have been left implicit in the Computer Science literature. This paper aims to remedy this omission, by laying out a simple promise theoretic model, summarizing a long standing trail of work on the observation of distributed systems, based on elementary distinguishability of observations, and classical causality, with history. Three distinct views of a system are sought, across a number of scales, that described how information is transmitted (and lost) as it moves around the system, aggregated into journals and logs.


Reward Advancement: Transforming Policy under Maximum Causal Entropy Principle

arXiv.org Artificial Intelligence

Many real-world human behaviors can be characterized as a sequential decision making processes, such as urban travelers choices of transport modes and routes (Wu et al. 2017). Differing from choices controlled by machines, which in general follows perfect rationality to adopt the policy with the highest reward, studies have revealed that human agents make sub-optimal decisions under bounded rationality (Tao, Rohde, and Corcoran 2014). Such behaviors can be modeled using maximum causal entropy (MCE) principle (Ziebart 2010). In this paper, we define and investigate a general reward trans-formation problem (namely, reward advancement): Recovering the range of additional reward functions that transform the agent's policy from original policy to a predefined target policy under MCE principle. We show that given an MDP and a target policy, there are infinite many additional reward functions that can achieve the desired policy transformation. Moreover, we propose an algorithm to further extract the additional rewards with minimum "cost" to implement the policy transformation.


Rethink Global Reward Game and Credit Assignment in Multi-agent Reinforcement Learning

arXiv.org Artificial Intelligence

Cooperative game is a critical research area in multi-agent reinforcement learning (MARL). Global reward game is a subclass of cooperative games, where all agents aim to maximize cumulative global rewards. Credit assignment is an important problem studied in the global reward game. Most works stand by the view of non-cooperative-game theoretical framework with the shared reward approach, i.e., each agent is assigned a shared global reward directly. This, however, may give each agent an inaccurate feedback on his contribution to the group. In this paper, we introduce a cooperative-game theoretical framework and extend it to the finite-horizon case. We show that our proposed framework is a superset of the global reward game. Based on this framework, we propose an algorithm called Shapley Q-value policy gradient (SQPG) to learn a local reward approach that can distribute the cumulative global reward fairly, reflecting each agent's own contribution in contrast to the shared reward approach. We evaluate our method on the Cooperative Navigation, Prey-and-Predator and Traffic Junction, compared with MADDPG, COMA, Independent actor-critic and Independent DDPG. In the experiments, our algorithm shows better convergence than the baselines.


Optimal and Bounded-Suboptimal Multi-Agent Motion Planning

AAAI Conferences

Multi-Agent Motion Planning (MAMP) is the task of finding conflict-free kinodynamically feasible plans for agents from start to goal states. While MAMP is of significant practical importance, existing solvers are either incomplete, inefficient or rely on simplifying assumptions. For example, Multi-Agent Path Finding (MAPF) solvers conventionally assume discrete timesteps and rectilinear movement of agents between neighboring vertices of a graph. In this paper, we develop MAMP solvers that obviate these simplifying assumptions and yet generalize the core ideas of state-of-the-art MAPF solvers. Specifically, since different motions may take arbitrarily different durations, MAMP solvers need to efficiently reason with continuous time and arbitrary wait durations. To do so, we adapt (Enhanced) Conflict-Based Search to continuous time and develop a novel bounded-suboptimal extension of Safe Interval Path Planning, called Soft Conflict Interval Path Planning. On the theoretical side, we justify the completeness, optimality and bounded-suboptimality of our MAMP solvers. On the experimental side, we show that our MAMP solvers scale well with increasing suboptimality bounds.


Measuring the Vulnerability of a Multi-Agent Pathfinding Solution

AAAI Conferences

Multi-agent pathfinding is the problem of finding a non-interfering paths for a set of agents, such that if the agents follow these paths then each agent will reach its desired destination. Recent years have shown tremendous advances in this field, with optimal and suboptimal algorithms that are able to plan paths for over 100 agents in reasonable time. However, autonomous mobile agents are prime targets for cyber-security attacks, where an adversary may take control over an agent to disrupt the agents execution of their plan. This threat raises two questions. The first question is how much damage can an agent do if it does not follow its plan. The second question is how can one plan a-priori to be as robust as possible to such cyber-attacks. In this work, We provide an answer to both questions. To compute the maximal amount of damage that an adversary agent can do, we define a corresponding graph search problem and solve this problem with A*. Then, we provide a very simple method to choose a solution that is robust to such damages. We demonstrate both algorithms in simulation over standard multi-agent pathfinding domains.


Unbounded Sub-Optimal Conflict-Based Search in Complex Domains

AAAI Conferences

Conflict-Based Search (CBS) is a state of the art algorithm for multi-agent pathfinding (MAPF). CBS has been studied in many domains, however, most research has focused on classic domains with point agents that move with unit time steps and unit costs. In this work, we are interested in MAPF solutions for classic domains and complex domains, that is, domains which include shaped agents, actions with non-unit costs, non-uniform action durations and/or non-holonomic or kinodynamic movement constraints. Prior work on sub-optimal formulations of CBS has focused on heuristics. Instead, our work introduces new types of constraints. We show that certain constraint formulations have properties that can cause CBS to run orders of magnitude faster, but may cause the algorithm to be incomplete and yield sub-optimal results. We introduce new conditional constraints which allow CBS to exploit constraint properties which cause it to run faster and still retain algorithmic completeness. We additionally formulate a new constraint accumulation technique called constraint overloading which utilizes conditional constraints in order to achieve further performance gains.


Unifying Search-Based and Compilation-Based Approaches to Multi-Agent Path Finding through Satisfiability Modulo Theories

AAAI Conferences

We describe an attempt to unify search-based and compilation-based approaches to multi-agent path finding (MAPF) through satisfiability modulo theories (SMT). The task in MAPF is to navigate agents in an undirected graph to given goal vertices so that they do not collide. We rephrase Conflict-Based Search (CBS), one of the state-of-the-art algorithms for optimal MAPF solving, in the terms of SMT. This idea combines SAT-based solving known from MDD-SAT, a SAT-based optimal MAPF solver, at the low level with conflict elimination of CBS at the high level. Where the standard CBS branches the search after a conflict occurs, we refine the propositional model with a disjunctive constraint instead. Our novel algorithm called SMT-CBS hence does not branch at the high-level but incrementally extends the propositional model that is consulted with the SAT solver at each iteration. We experimentally compare SMT-CBS with CBS and MDD-SAT.


Multi-Agent Path Finding with Continuous Time and Geometric Agents Viewed through Satisfiability Modulo Theories (SMT)

AAAI Conferences

This paper addresses a variant of multi-agent path finding (MAPF) in continuous space and time. We present a new solving approach based on satisfiability modulo theories (SMT) to obtain makespan optimal solutions. The standard MAPF is a task of navigating agents in an undirected graph from given starting vertices to given goal vertices so that agents do not collide with each other in vertices of the graph. In the continuous version (MAPF-R) agents move in an n-dimensional Euclidean space along straight lines that interconnect predefined positions. For simplicity, we work with circular omni-directional agents having constant velocities in the 2D plane. As agents can have different sizes and move smoothly along lines, a non-colliding movement along certain lines with small agents can result in a collision if the same movement is performed with larger agents. Our SMT-based approach for MAPF-R called SMT-CBS-R reformulates the Conflict-based Search (CBS) algorithm in terms of SMT concepts. We suggest lazy generation of decision variables and constraints. Each time a new conflict is discovered, the underlying encoding is extended with new variables and constraints to eliminate the conflict. We compared SMT-CBS-R and adaptations of CBS for the continuous variant of MAPF experimentally.