AITopics

1907.06096

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games > Computer Games (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

#artificialintelligenceJul-12-2019, 06:36:45 GMT

AI smokes 5 poker champs at a time in no-limit Hold'em with 'relentless consistency' – TechCrunch

The machines have proven their superiority in one-on-one games like chess and go, and even poker -- but in complex multiplayer versions of the card game humans have retained their edge… until now. An evolution of the last AI agent to flummox poker pros individually is now decisively beating them in championship-style 6-person game. As documented in a paper published in the journal Science today, the CMU/Facebook collaboration they call Pluribus reliably beats five professional poker players in the same game, or one pro pitted against five independent copies of itself. It's a major leap forward in capability for the machines, and amazingly is also far more efficient than previous agents as well. One-on-one poker is a weird game, and not a simple one, but the zero-sum nature of it (whatever you lose, the other player gets) makes it susceptible to certain strategies in which computer able to calculate out far enough can put itself at an advantage.

artificial intelligence, pluribus, social media, (12 more...)

#artificialintelligence

Genre: Research Report (0.89)

Industry: Leisure & Entertainment > Games > Poker (0.49)

Technology:

Information Technology > Communications > Social Media (0.59)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.50)
Information Technology > Artificial Intelligence > Games (0.48)

arXiv.org Artificial IntelligenceJul-12-2019

From Observability to Significance in Distributed Information Systems

Burgess, Mark

To understand and explain process behaviour we need to be able to see it, and decide its significance, i.e. be able to tell a story about its behaviours. This paper describes a few of the modelling challenges that underlie monitoring and observation of processes in IT, by human or by software. The topic of the observability of systems has been elevated recently in connection with computer monitoring and tracing of processes for debugging and forensics. It raises the issue of well-known principles of measurement, in bounded contexts, but these issues have been left implicit in the Computer Science literature. This paper aims to remedy this omission, by laying out a simple promise theoretic model, summarizing a long standing trail of work on the observation of distributed systems, based on elementary distinguishability of observations, and classical causality, with history. Three distinct views of a system are sought, across a number of scales, that described how information is transmitted (and lost) as it moves around the system, aggregated into journals and logs.

data mining, machine learning, natural language, (21 more...)

1907.05636

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(7 more...)

Genre: Research Report (0.40)

Industry: Information Technology (0.67)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(4 more...)

arXiv.org Artificial IntelligenceJul-11-2019

Reward Advancement: Transforming Policy under Maximum Causal Entropy Principle

Wu, Guojun, Li, Yanhua, Liu, Zhenming, Bao, Jie, Zheng, Yu, Ye, Jieping, Luo, Jun

Many real-world human behaviors can be characterized as a sequential decision making processes, such as urban travelers choices of transport modes and routes (Wu et al. 2017). Differing from choices controlled by machines, which in general follows perfect rationality to adopt the policy with the highest reward, studies have revealed that human agents make sub-optimal decisions under bounded rationality (Tao, Rohde, and Corcoran 2014). Such behaviors can be modeled using maximum causal entropy (MCE) principle (Ziebart 2010). In this paper, we define and investigate a general reward trans-formation problem (namely, reward advancement): Recovering the range of additional reward functions that transform the agent's policy from original policy to a predefined target policy under MCE principle. We show that given an MDP and a target policy, there are infinite many additional reward functions that can achieve the desired policy transformation. Moreover, we propose an algorithm to further extract the additional rewards with minimum "cost" to implement the policy transformation.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

1907.0539

Country:

Asia > China > Guangdong Province > Shenzhen (0.05)
North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report (0.50)

Industry:

Transportation > Infrastructure & Services (0.95)
Transportation > Passenger (0.73)
Transportation > Ground (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.91)

arXiv.org Artificial IntelligenceJul-11-2019

Rethink Global Reward Game and Credit Assignment in Multi-agent Reinforcement Learning

Wang, Jianhong, Zhang, Yuan, Kim, Tae-Kyun, Gu, Yunjie

Cooperative game is a critical research area in multi-agent reinforcement learning (MARL). Global reward game is a subclass of cooperative games, where all agents aim to maximize cumulative global rewards. Credit assignment is an important problem studied in the global reward game. Most works stand by the view of non-cooperative-game theoretical framework with the shared reward approach, i.e., each agent is assigned a shared global reward directly. This, however, may give each agent an inaccurate feedback on his contribution to the group. In this paper, we introduce a cooperative-game theoretical framework and extend it to the finite-horizon case. We show that our proposed framework is a superset of the global reward game. Based on this framework, we propose an algorithm called Shapley Q-value policy gradient (SQPG) to learn a local reward approach that can distribute the cumulative global reward fairly, reflecting each agent's own contribution in contrast to the shared reward approach. We evaluate our method on the Cooperative Navigation, Prey-and-Predator and Traffic Junction, compared with MADDPG, COMA, Independent actor-critic and Independent DDPG. In the experiments, our algorithm shows better convergence than the baselines.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

1907.05707

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
Europe > France (0.04)
Asia > China (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (0.30)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Optimal and Bounded-Suboptimal Multi-Agent Motion Planning

Cohen, Liron (University of Southern California) | Uras, Tansel (University of Southern California) | Kumar, T. K. Satish (University of Southern California) | Koenig, Sven (University of Southern California)

Multi-Agent Motion Planning (MAMP) is the task of finding conflict-free kinodynamically feasible plans for agents from start to goal states. While MAMP is of significant practical importance, existing solvers are either incomplete, inefficient or rely on simplifying assumptions. For example, Multi-Agent Path Finding (MAPF) solvers conventionally assume discrete timesteps and rectilinear movement of agents between neighboring vertices of a graph. In this paper, we develop MAMP solvers that obviate these simplifying assumptions and yet generalize the core ideas of state-of-the-art MAPF solvers. Specifically, since different motions may take arbitrarily different durations, MAMP solvers need to efficiently reason with continuous time and arbitrary wait durations. To do so, we adapt (Enhanced) Conflict-Based Search to continuous time and develop a novel bounded-suboptimal extension of Safe Interval Path Planning, called Soft Conflict Interval Path Planning. On the theoretical side, we justify the completeness, optimality and bounded-suboptimality of our MAMP solvers. On the experimental side, we show that our MAMP solvers scale well with increasing suboptimality bounds.

agent, artificial intelligence, time interval, (17 more...)

Country: North America > United States > California (0.14)

Genre: Research Report (0.68)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Yoeli, Rotem (Ben-Gurion University of the Negev) | Stern, Roni (Ben-Gurion University of the Negev) | Atzmon, Dor (Ben-Gurion University of the Negev)

Measuring the Vulnerability of a Multi-Agent Pathfinding Solution

Multi-agent pathfinding is the problem of finding a non-interfering paths for a set of agents, such that if the agents follow these paths then each agent will reach its desired destination. Recent years have shown tremendous advances in this field, with optimal and suboptimal algorithms that are able to plan paths for over 100 agents in reasonable time. However, autonomous mobile agents are prime targets for cyber-security attacks, where an adversary may take control over an agent to disrupt the agents execution of their plan. This threat raises two questions. The first question is how much damage can an agent do if it does not follow its plan. The second question is how can one plan a-priori to be as robust as possible to such cyber-attacks. In this work, We provide an answer to both questions. To compute the maximal amount of damage that an adversary agent can do, we define a corresponding graph search problem and solve this problem with A*. Then, we provide a very simple method to choose a solution that is robust to such damages. We demonstrate both algorithms in simulation over standard multi-agent pathfinding domains.

abnormal action, agent, malicious entity, (13 more...)

Country: Asia > Middle East > Israel > Southern District > Beer-Sheva (0.05)

Industry: Information Technology > Security & Privacy (0.88)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.36)

Unbounded Sub-Optimal Conflict-Based Search in Complex Domains

Walker, Thayne T. (University of Denver) | Sturtevant, Nathan R. (University of Alberta) | Felner, Ariel (Ben-Gurion University of the Negev)

Conflict-Based Search (CBS) is a state of the art algorithm for multi-agent pathfinding (MAPF). CBS has been studied in many domains, however, most research has focused on classic domains with point agents that move with unit time steps and unit costs. In this work, we are interested in MAPF solutions for classic domains and complex domains, that is, domains which include shaped agents, actions with non-unit costs, non-uniform action durations and/or non-holonomic or kinodynamic movement constraints. Prior work on sub-optimal formulations of CBS has focused on heuristics. Instead, our work introduces new types of constraints. We show that certain constraint formulations have properties that can cause CBS to run orders of magnitude faster, but may cause the algorithm to be incomplete and yield sub-optimal results. We introduce new conditional constraints which allow CBS to exploit constraint properties which cause it to run faster and still retain algorithmic completeness. We additionally formulate a new constraint accumulation technique called constraint overloading which utilizes conditional constraints in order to achieve further performance gains.

conditional constraint, conflict, constraint, (15 more...)

Country:

North America > United States > New York (0.05)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.05)
Asia > Middle East > Israel (0.05)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.71)

Surynek, Pavel (Czech Technical University in Prague)

Unifying Search-Based and Compilation-Based Approaches to Multi-Agent Path Finding through Satisfiability Modulo Theories

We describe an attempt to unify search-based and compilation-based approaches to multi-agent path finding (MAPF) through satisfiability modulo theories (SMT). The task in MAPF is to navigate agents in an undirected graph to given goal vertices so that they do not collide. We rephrase Conflict-Based Search (CBS), one of the state-of-the-art algorithms for optimal MAPF solving, in the terms of SMT. This idea combines SAT-based solving known from MDD-SAT, a SAT-based optimal MAPF solver, at the low level with conflict elimination of CBS at the high level. Where the standard CBS branches the search after a conflict occurs, we refine the propositional model with a disjunctive constraint instead. Our novel algorithm called SMT-CBS hence does not branch at the high-level but incrementally extends the propositional model that is consulted with the SAT solver at each iteration. We experimentally compare SMT-CBS with CBS and MDD-SAT.

agent, collision, constraint, (10 more...)

Country: Europe > Czechia > Prague (0.05)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Surynek, Pavel (Czech Technical University in Prague)

Multi-Agent Path Finding with Continuous Time and Geometric Agents Viewed through Satisfiability Modulo Theories (SMT)

This paper addresses a variant of multi-agent path finding (MAPF) in continuous space and time. We present a new solving approach based on satisfiability modulo theories (SMT) to obtain makespan optimal solutions. The standard MAPF is a task of navigating agents in an undirected graph from given starting vertices to given goal vertices so that agents do not collide with each other in vertices of the graph. In the continuous version (MAPF-R) agents move in an n-dimensional Euclidean space along straight lines that interconnect predefined positions. For simplicity, we work with circular omni-directional agents having constant velocities in the 2D plane. As agents can have different sizes and move smoothly along lines, a non-colliding movement along certain lines with small agents can result in a collision if the same movement is performed with larger agents. Our SMT-based approach for MAPF-R called SMT-CBS-R reformulates the Conflict-based Search (CBS) algorithm in terms of SMT concepts. We suggest lazy generation of decision variables and constraints. Each time a new conflict is discovered, the underlying encoding is extended with new variables and constraints to eliminate the conflict. We compared SMT-CBS-R and adaptations of CBS for the continuous variant of MAPF experimentally.

agent, collision, multi-agent path, (10 more...)

Country: Europe > Czechia > Prague (0.05)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.90)