Search
High-Level Representations for Game-Tree Search in RTS Games
Uriarte, Alberto (Drexel University) | Ontañón, Santiago (Drexel University)
From an AI point of view, Real-Time Strategy (RTS) games are hard because they have enormous state spaces, they are real-time and partially observable. In this paper, we explore an approach to deploy game-tree search in RTS games by using game state abstraction, and explore the effect of using different abstractions over the game state. Different abstractions capture different parts of the game state, and result in different branching factors when used for game-tree search algorithms. We evaluate the different representations using Monte Carlo Tree Search in the context of StarCraft.
Leveraging Parallel Architectures in AI Search Algorithms for Games
Barriga, Nicolas A. (University of Alberta)
This document contains a summary of research performed by the author on the topic of search algorithms for games. An outline of the problems being addressed is provided, along with the progress already made, and planned future work. The specific subjects studied are: parallelizing UCT search on GPUs, the development of a hierarchical search framework for Real-Time Strategy (RTS) games and the building placement problem in RTS games. We propose to take advantage of different parallel architectures to help solve these problems.
Game-Tree Search over High-Level Game States in RTS Games
Uriarte, Alberto (Drexel University) | Ontañón, Santiago (Drexel University)
From an AI point of view, Real-Time Strategy (RTS) games are hard because they have enormous state spaces, they are real-time and partially observable. In this paper, we present an approach to deploy game-tree search in RTS games by using game state abstraction. We propose a high-level abstract representation of the game state, that significantly reduces the branching factor when used for game-tree search algorithms. Using this high-level representation, we evaluate versions of alpha-beta search and of Monte Carlo Tree Search (MCTS). We present experiments in the context of StarCraft showing promising results in dealing with the large branching factors present in RTS games.
Algorithm Selection for Combinatorial Search Problems: A Survey
Kotthoff, Lars (University College Cork)
The algorithm selection problem is concerned with selecting the best algorithm to solve a given problem instance on a case-by-case basis. It has become especially relevant in the last decade, with researchers increasingly investigating how to identify the most suitable existing algorithm for solving a problem instance instead of developing new algorithms. This survey presents an overview of this work focusing on the contributions made in the area of combinatorial search problems, where algorithm selection techniques have achieved significant performance improvements. We unify and organise the vast literature according to criteria that determine algorithm selection systems in practice. The comprehensive classification of approaches identifies and analyses the different directions from which algorithm selection has been approached. This article contrasts and compares different methods for solving the problem as well as ways of using these solutions.
Simple Regret Optimization in Online Planning for Markov Decision Processes
We consider online planning in Markov decision processes (MDPs). In online planning, the agent focuses on its current state only, deliberates about the set of possible policies from that state onwards and, when interrupted, uses the outcome of that exploratory deliberation to choose what action to perform next. Formally, the performance of algorithms for online planning is assessed in terms of simple regret, the agent's expected performance loss when the chosen action, rather than an optimal one, is followed. To date, state-of-the-art algorithms for online planning in general MDPs are either best effort, or guarantee only polynomial-rate reduction of simple regret over time. Here we introduce a new Monte-Carlo tree search algorithm, BRUE, that guarantees exponential-rate and smooth reduction of simple regret. At a high level, BRUE is based on a simple yet non-standard state-space sampling scheme, MCTS2e, in which different parts of each sample are dedicated to different exploratory objectives. We further extend BRUE with a variant of ``learning by forgetting.'' The resulting parametrized algorithm, BRUE(alpha), exhibits even more attractive formal guarantees than BRUE. Our empirical evaluation shows that both BRUE and its generalization, BRUE(alpha), are also very effective in practice and compare favorably to the state-of-the-art.
Why Local Search Excels in Expression Simplification
Ruijl, Ben, Plaat, Aske, Vermaseren, Jos, Herik, Jaap van den
Simplifying expressions is important to make numerical integration of large expressions from High Energy Physics tractable. To this end, Horner's method can be used. Finding suitable Horner schemes is assumed to be hard, due to the lack of local heuristics. Recently, MCTS was reported to be able to find near optimal schemes. However, several parameters had to be fine-tuned manually. In this work, we investigate the state space properties of Horner schemes and find that the domain is relatively flat and contains only a few local minima. As a result, the Horner space is appropriate to be explored by Stochastic Local Search (SLS), which has only two parameters: the number of iterations (computation time) and the neighborhood structure. We found a suitable neighborhood structure, leaving only the allowed computation time as a parameter. We performed a range of experiments. The results obtained by SLS are similar or better than those obtained by MCTS. Furthermore, we show that SLS obtains the good results at least 10 times faster. Using SLS, we can speed up numerical integration of many real-world large expressions by at least a factor of 24. For High Energy Physics this means that numerical integrations that took weeks can now be done in hours.
A Tabu Search Algorithm for the Multi-period Inspector Scheduling Problem
Qin, Hu, Zhang, Zizhen, Xie, Yubin, Lim, Andrew
This paper introduces a multi-period inspector scheduling problem (MPISP), which is a new variant of the multi-trip vehicle routing problem with time windows (VRPTW). In the MPISP, each inspector is scheduled to perform a route in a given multi-period planning horizon. At the end of each period, each inspector is not required to return to the depot but has to stay at one of the vertices for recuperation. If the remaining time of the current period is insufficient for an inspector to travel from his/her current vertex $A$ to a certain vertex B, he/she can choose either waiting at vertex A until the start of the next period or traveling to a vertex C that is closer to vertex B. Therefore, the shortest transit time between any vertex pair is affected by the length of the period and the departure time. We first describe an approach of computing the shortest transit time between any pair of vertices with an arbitrary departure time. To solve the MPISP, we then propose several local search operators adapted from classical operators for the VRPTW and integrate them into a tabu search framework. In addition, we present a constrained knapsack model that is able to produce an upper bound for the problem. Finally, we evaluate the effectiveness of our algorithm with extensive experiments based on a set of test instances. Our computational results indicate that our approach generates high-quality solutions.
On Minimax Optimal Offline Policy Evaluation
Li, Lihong, Munos, Remi, Szepesvari, Csaba
This paper studies the off-policy evaluation problem, where one aims to estimate the value of a target policy based on a sample of observations collected by another policy. We first consider the multi-armed bandit case, establish a minimax risk lower bound, and analyze the risk of two standard estimators. It is shown, and verified in simulation, that one is minimax optimal up to a constant, while another can be arbitrarily worse, despite its empirical success and popularity. The results are applied to related problems in contextual bandits and fixed-horizon Markov decision processes, and are also related to semi-supervised learning.
e-Valuate: A Two-player Game on Arithmetic Expressions -- An Update
Aravamuthan, Sarang, Ganguly, Biswajit
e-Valuate is a game on arithmetic expressions. The players have contrasting roles of maximizing and minimizing the given expression. The maximizer proposes values and the minimizer substitutes them for variables of his choice. When the expression is fully instantiated, its value is compared with a certain minimax value that would result if the players played to their optimal strategies. The winner is declared based on this comparison. We use a game tree to represent the state of the game and show how the minimax value can be computed efficiently using backward induction and alpha-beta pruning. The efficacy of alpha-beta pruning depends on the order in which the nodes are evaluated. Further improvements can be obtained by using transposition tables to prevent reevaluation of the same nodes. We propose a heuristic for node ordering. We show how the use of the heuristic and transposition tables lead to improved performance by comparing the number of nodes pruned by each method. We describe some domain-specific variants of this game. The first is a graph theoretic formulation wherein two players share a set of elements of a graph by coloring a related set with each player looking to maximize his share. The set being shared could be either the set of vertices, edges or faces (for a planar graph). An application of this is the sharing of regions enclosed by a planar graph where each player's aim is to maximize the area of his share. Another variant is a tiling game where the players alternately place dominoes on a $8 \times 8$ checkerboard to construct a maximal partial tiling. We show that the size of the tiling $x$ satisfies $22 \le x \le 32$ by proving that any maximal partial tiling requires at least $22$ dominoes.
Simulating Non Stationary Operators in Search Algorithms
Goëffon, Adrien, Lardeux, Frédéric, Saubion, Frédéric
In this paper, we propose a model for simulating search operators whose behaviour often changes continuously during the search. In these scenarios, the performance of the operators decreases when they are applied. This is motivated by the fact that operators for optimization problems are often roughly classified into exploitation operators and exploration operators. Our simulation model is used to compare the different performances of operator selection policies and clearly identify their ability to adapt to such specific operators behaviours. The experimental study provides interesting results on the respective behaviours of operator selection policies when faced to such non stationary search scenarios. Keywords: Island Models, Adaptive Operator Selection 1. Introduction Selecting the most suitable operators in a search algorithm when solving optimization problems is an active research area (Eiben et al., 2007; Lobo et al., 2007). Given an optimization problem, a search algorithm mainly consists in applying basic solving operators -- heuristics -- in order to explore and exploit the search space for retrieving solutions.