Industry
Understanding the Success of Perfect Information Monte Carlo Sampling in Game Tree Search
Long, Jeffrey Richard (University of Alberta) | Sturtevant, Nathan R. (University of Alberta) | Buro, Michael (University of Alberta) | Furtak, Timothy (University of Alberta)
Perfect Information Monte Carlo (PIMC) search is a practical technique for playing imperfect information games that are too large to be optimally solved. Although PIMC search has been criticized in the past for its theoretical deficiencies, in practice it has often produced strong results in a variety of domains. In this paper, we set out to resolve this discrepancy. The contributions of the paper are twofold. First, we use synthetic game trees to identify game properties that result in strong or weak performance for PIMC search as compared to an optimal player. Second, we show how these properties can be detected in real games, and demonstrate that they do indeed appear to be good predictors of the strength of PIMC search. Thus, using the tools established in this paper, it should be possible to decide a priori whether PIMC search will be an effective approach to new and unexplored games.
Dealing with Infinite Loops, Underestimation, and Overestimation of Depth-First Proof-Number Search
Kishimoto, Akihiro (Tokyo Institute of Technology and JST PRESTO)
Depth-first proof-number search (df-pn) is powerful AND/OR tree search to solve positions in games. However, df-pn has a notorious problem of infinite loops when applied to domains with repetitions. Df-pn(r) cures it by ignoring proof and disproof numbers that may lead to infinite loops. This paper points out that df-pn(r) has a serious issue of underestimating proof and disproof numbers, while it also suffers from the overestimation problem occurring in directed acyclic graph. It then presents two practical solutions to these problems. While bypassing infinite loops, the threshold controlling algorithm (TCA) solves the underestimation problem by increasing the thresholds of df-pn. The source node detection algorithm (SNDA) detects the cause of overestimation and modifies the computation of proof and disproof numbers. Both TCA and SNDA are implemented on top of df-pn to solve tsume-shogi (checkmating problem in Japanese chess). Results show that df-pn with TCA and SNDA is far superior to df-pn(r). Our tsume-shogi solver is able to solve several difficult positions previously unsolved by any other solvers.
Parallel Depth First Proof Number Search
Kaneko, Tomoyuki (The University of Tokyo)
The depth first proof number search (df-pn) is an effective and popular algorithm for solving and-or tree problems by using proof and disproof numbers. This paper presents a simple but effective parallelization of the df-pn search algorithm for a shared-memory system. In this parallelization, multiple agents autonomously conduct the df-pn with a shared transposition table. For effective cooperation of agents, virtual proof and disproof numbers are introduced for each node, which is an estimation of future proof and disproof numbers by using the number of agents working on the node's descendants as a possible increase. Experimental results on large checkmate problems in shogi, which is a popular chess variant in Japan, show that reasonable increases in speed were achieved with small overheads in memory.
High-Quality Policies for the Canadian Traveler's Problem
Eyerich, Patrick (Albert-Ludwigs-Universitรคt Freiburg) | Keller, Thomas (Albert-Ludwigs-Universitรคt Freiburg) | Helmert, Malte (Albert-Ludwigs-Universitรคt Freiburg)
We consider the stochastic variant of the Canadian Traveler's Problem, a path planning problem where adverse weather can cause some roads to be untraversable. The agent does not initially know which roads can be used. However, it knows a probability distribution for the weather, and it can observe the status of roads incident to its location. The objective is to find a policy with low expected travel cost. We introduce and compare several algorithms for the stochastic CTP. Unlike the optimistic approach most commonly considered in the literature, the new approaches we propose take uncertainty into account explicitly. We show that this property enables them to generate policies of much higher quality than the optimistic one, both theoretically and experimentally.
Independent Additive Heuristics Reduce Search Multiplicatively
Breyer, Teresa Maria (UCLA) | Korf, Richard (UCLA)
This paper analyzes the performance of IDA* using additive heuristics. We show that the reduction in the number of nodes expanded using multiple independent additive heuristics is the product of the reductions achieved by the individual heuristics. First, we formally state and prove this result on unit edge-cost undirected graphs with a uniform branching factor. Then, we empirically verify it on a model of the 4-peg Towers of Hanoi problem. We also run experiments on the multiple sequence alignment problem showing more general applicability to non-unit edge-cost directed graphs. Then, we extend an existing model to predict the performance of IDA* with a single pattern database to independent additive disjoint pattern databases. This is the first analysis of the performance of independent additive heuristics.
Transmission Network Expansion Planning with Simulation Optimization
Bent, Russell (Los Alamos National Laboratory) | Berscheid, Alan (Los Alamos National Laboratory) | Toole, G. Loren (Los Alamos National Laboratory)
Within the electric power literature the transmission expansion planning problem (TNEP) refers to the problem of how to upgrade an electric power network to meet future demands. As this problem is a complex, non-linear, and non-convex optimization problem, researchers have traditionally focused on approximate models of power flows. Existing approaches are often tightly coupled to the approximation choice. Until recently, these approximations have produced results that are straight-forward to adapt to the more complex (real) problem. However, the power grid is evolving towards a state where the adaptations are no longer easy (e.g. large amounts of limited control, renewable generation) that necessitates new optimization techniques. In this paper, we propose a local search variation of the powerful Limited Discrepancy Search (LDLS) that encapsulates the complexity of power flows in a black box that may be queried for information about the quality of a proposed expansion. This allows the development of a new optimization algorithm that is independent of the underlying power model.
Integrating Reinforcement Learning into a Programming Language
Simpkins, Christopher (Georgia Institute of Technology)
Creating artificial intelligent agents that are high-fidelity simulations of natural agents will require the engagement of behavioral scientists. However, agent programming systems that are accessible to behavioral scientists are too limited to create rich agents, and systems for creating rich agents are accessible mainly to computer scientists, not behavioral scientists. We are solving this problem by engaging behavioral scientists in the design of a programming language, and integrating reinforcement learning into the programming language. This strategy will help our language achieve adaptivity, modularity, and, most importantly, accessibility to behavioral scientists. In addition to allowing behavioral scientist to write rich agent programs, our language โ AFABL (A Friendly Behavior Language) โ will enable a true discipline of modular agent software engineering with broad implications for games, interactive storytelling, and social simulations.
Local Optimization for Simulation of Natural Motion
Erez, Tom (Washington University in St. Louis)
I intend to use RL to bring the two together, The Reinforcement Learning (RL) agent interacts with a dynamical and generate motion from the proposed first principles system whose states capture all the relevant information in realistic biomechanical models, and compare the about the current configuration of the agent and its results to the behavior of living creatures. This is a nontrivial environment. By specifying a sequence of actions, the agent problem: biomechanical models are continuous, highdimensional alters the state transitions of this dynamical system. The optimality and nonlinear, and the optimality criteria considered criterion is formalized by a reward function defined in the literature are non-quadratic. In order to address over state-action pairs, and the agent's goal is to maximize these profound challenges, I propose three basic principles the cumulative reward.
Multi-Agent Fault Tolerance Inspired by a Computational Analysis of Cancer
Olsen, Megan (University of Massachusetts Amherst)
My thesis investigates fault tolerance for cooperative agent systems that have some equivalent of self-replication and self-death. Utilizing biologically-inspired mechanisms, I increase multi-agent system robustness for faulty agents when it is unknown exactly which agent is malfunctioning. It is important to determine new ways to increase robustness of a system, as otherwise it cannot be guaranteed to function in all situations and thus cannot be relied upon. Robustness of a system allows agents to recover from errors and thus function continuously, an increasingly important trait as agent systems are deployed in real world scenarios such as sensor networks or surveillance systems where faulty or malicious nodes could disrupt application performance. To achieve robustness, there must either be prevention of all errors, or a technique for recovering from errors after they have occurred. My thesis creates a new fault tolerance mechanism inspired by cancer biology to remove faulty agents, and then re-applies the developed technique to study the removal of biological cancer cells in simulation.
On Multi-Robot Area Coverage
Fazli, Pooyan (University of British Columbia)
Area coverage is one of the emerging problems in multi-robot coordination. In this task a team of robots is cooperatively trying to observe or sweep an entire area, possibly containing obstacles, with their sensors or actuators. The goal is to build an efficient path for each robot which jointly ensure that every single point in the environment can be seen or swept by at least one of the robots while performing the task.