Search
Trade-Offs in Sampling-Based Adversarial Planning
Ramanujan, Raghuram (Cornell University) | Selman, Bart (Cornell University)
The Upper Confidence bounds for Trees (UCT) algorithm has in recent years captured the attention of the planning and game-playing community due to its notable success in the game of Go. However, attempts to reproduce similar levels of performance in domains that are the forte of Minimax-style algorithms have been largely unsuccessful, making any comparative studies of the two hard. In this paper, we study UCT in the game of Mancala, which to our knowledge is the first domain where both search algorithms perform quite well with minimal enhancement. We focus on the three key components of the UCT algorithm in its purest form - targeted node expansion, state value estimation via playouts and averaging backups - and look at their contributions to the overall performance of the algorithm. We study the trade-offs involved in using alternate ways to perform these steps. Finally, we demonstrate a novel hybrid approach to enhancing UCT, that exploits its superior decision accuracy in regions of the search space with few terminal nodes.
Searching for Plans with Carefully Designed Probes
Lipovetzky, Nir (Universitat Pompeu Fabra) | Geffner, Hector (ICREA and Universitat Pompeu Fabra)
We define a probe to be a single action sequence computedgreedily from a given state that either terminates in the goalor fails. We show that by designing these probes carefullyusing a number of existing and new polynomial techniquessuch as helpful actions, landmarks, commitments, and con-sistent subgoals, a single probe from the initial state solvesby itself 683 out of 980 problems from previous IPCs, a num-ber that compares well with the 627 problems solved by FFin EHC mode, with similar times and plan lengths. We alsoshow that by launching one probe from each expanded statein a standard greedy best first search informed by the addi-tive heuristic, the number of problems solved jumps to 900(92%), as opposed to FF that solves 827 problems (84%),and LAMA that solves 879 (89%). The success of probessuggests that many domains can be solved easily once a suit-able serialization of the landmarks is found, an observationthat may open new connections between recent work in plan-ning and more classical work concerning goal serializationand problem decomposition in planning and search.
Heuristic Search for Generalized Stochastic Shortest Path MDPs
Kolobov, Andrey (University of Washington, Seattle) | Mausam, Mausam (University of Washington, Seattle) | Weld, Daniel S. (University of Washington, Seattle) | Geffner, Hector (ICREA and Universitat Pompeu Fabra)
Research in efficient methods for solving infinite-horizon MDPs has so far concentrated primarily on discounted MDPs and the more general stochastic shortest path problems (SSPs). These are MDPs with 1) an optimal value function V* that is the unique solution of Bellman equation and 2) optimal policies that are the greedy policies w.r.t. V*. This paper’s main contribution is the description of a new class of MDPs, that have well-defined optimal solutions that do not comply with either 1 or 2 above. We call our new class Generalized Stochastic Shortest Path (GSSP) problems. GSSP allows more general reward structure than SSP and subsumes several established MDP types including SSP, positive-bounded, negative, and discounted-reward models. While existing efficient heuristic search algorithms like LAO* and LRTDP are not guaranteed to converge to the optimal value function for GSSPs, we present a new heuristic-search-based family of algorithms, FRET (Find, Revise, Eliminate Traps). A preliminary empirical evaluation shows that FRET solves GSSPs much more efficiently than Value Iteration.
The Multi-Round Balanced Traveling Tournament Problem
Hoshino, Richard (National Institute of Informatics) | Kawarabayashi, Ken-ichi (National Institute of Informatics)
Given an n -team sports league, the Traveling Tournament Problem (TTP) seeks to determine an optimal double round-robin schedule minimizing the sum total of distances traveled by the n teams as they move from city to city. In the TTP, the number of "rounds" is fixed at r = 2. In this paper, we propose the Multi-Round Balanced Traveling Tournament Problem (mb-TTP), inspired by the actual league structure of Japanese professional baseball, where n = 6 teams play 120 intra-league games over r = 8 rounds, subject to various constraints that ensure competitive balance. These additional balancing constraints enable us to reformulate the 2 k -round mb-TTP as a shortest path problem on a directed graph, for all k >= 1. We apply our theoretical algorithm to the 6-team Nippon (Japanese) Professional Baseball Central League, creating a distance-optimal schedule with 57836 kilometres of total travel, a 26.8% reduction compared to the 79067 kilometres traveled by these six teams during the 2010 regular season.
Where Ignoring Delete Lists Works, Part II: Causal Graphs
The ignoring delete lists relaxation is of paramount importance for both satisficing and optimal planning. In earlier work (Hoffmann 2005), it was observed that the optimal relaxation heuristic h+ has amazing qualities in many classical planning benchmarks, in particular pertaining to the complete absence of local minima. The proofs of this are hand-made, raising the question whether such proofs can be lead automatically by domain analysis techniques. In contrast to earlier disappointing results (Hoffmann 2005) — the analysis method has exponential runtime and succeeds only in two extremely simple benchmark domains — we herein answer this question in the affirmative. We establish connections between causal graph structure and h+ topology. This results in low-order polynomial time analysis methods, implemented in a tool we call TorchLight. Of the 12 domains where the absence of local minima has been proved, TorchLight gives strong success guarantees in 8 domains. Empirically, its analysis exhibits strong performance in a further 2 of these domains, plus in 4 more domains where local minima may exist but are rare. In this way, TorchLight can distinguish ``easy'' domains from "hard" ones. By summarizing structural reasons for analysis failure, TorchLight also provides diagnostic output indicating domain aspects that may cause local minima.
Hybrid Value Iteration for POMDPs
Maniloff, Diego (Massachusetts Institute of Technology) | Gmytrasiewicz, Piotr (University of Illinois at Chicago)
The Partially Observable Markov Decision Process (POMDP) provides a rich mathematical model for designing agents that have to formulate plans under uncertainty. The curses of dimensionality and history associated with solving POMDPs have lead to numerous refinements of the value iteration algorithm. Several exact methods with different pruning strategies have been devised, yet, limited scalability has lead research to focus on ways to approximate the optimal value function. One set of approximations relies on point-based value iteration, which maintains a fixed-size value function, and is typically executed offline. Another set of approximations relies on tree search, which explores the implicit tree defined by the value iteration equation, and is typically executed online. In this paper we present a hybrid value iteration algorithm that combines the benefits of point-based value iteration and tree search. Using our approach, a hybrid agent executes tree search online, and occasionally updates its offline-computed lower bound on the optimal value function, resulting in improved lookahead and higher obtained reward, while meeting real-time constraints. Thus, unlike other hybrid algorithms that use an invariant value function computed offline, our proposed scheme uses information from the real-time tree search process to reason whether to perform a point-based backup online. Keeping track of partial results obtained during online planning makes the computation of point-based backups less prohibitive. We report preliminary results that support our approach.
Winner Determination for Simultaneous Multi-Robot Task Allocation
Tang, Fang (California State Polytechnic University, Pomona) | Saha, Spondon (California State Polytechnic University, Pomona)
Multi-robot task allocation is an important problem for heterogeneous mobile robots. Simultaneous allocations with which multiple tasks are being allocated concurrently tend to lead to more efficient allocations than online or single task allocations. However, the simultaneous allocation also increases the complexity in the winner determination process, especially when robots are required to collaborate in order to accomplish certain tasks. This paper presents a winner determination algorithm for the simultaneous allocation of multi-robot tasks. The complete approach layers alow-level coalition formation algorithm for solving one multi-robot task with a high-level simultaneous task allocation approach. We implement a tree-based winner determination algorithm with an iterative deepening A* (IDA*) search and show that the algorithm is able to generate the optimal task-coalition mapping in the initial round and the IDA* performs efficiently based on time and space complexities.
Graph-Based Knowledge Discovery: Compression versus Frequency
Eberle, William (Tennessee Tech University) | Holder, Lawrence B. (Washington State University )
There are two primary types of graph-based data miners: frequent subgraph and compression-based miners. With frequent subgraph miners, the most interesting substructure is the largest one (or ones) that meet the minimum support. Whereas, compression-based graph miners discover those subgraphs that maximize the amount of compression that a particular substructure provides a graph. The algorithms associated with these two approaches are not only different, but they also may result in dramatic performance differences, as well as in the normative patterns being discovered. In order to compare these two types of graph-based approaches to knowledge discovery, in the following sections we will compare two publicly available applications: GASTON and SUBDUE.
Rook Jumping Maze Generation for AI Education
Neller, Todd William (Gettysburg College)
Rook Jumping Maze design provides a number of good opportunities for experiential learning of AI concepts, including uninformed search, stochastic local search, machine learning, and objective/utility function design. In this paper we will define the maze and present a collection of exercises that allow exploration of several AI topics in the context of an engaging, fun, and unifying task.
Solving Rubik's Cube Using SAT Solvers
Rubik's Cube is an easily-understood puzzle, which is originally called the "magic cube". It is a well-known planning problem, which has been studied for a long time. Yet many simple properties remain unknown. This paper studies whether modern SAT solvers are applicable to this puzzle. To our best knowledge, we are the first to translate Rubik's Cube to a SAT problem. To reduce the number of variables and clauses needed for the encoding, we replace a naive approach of 6 Boolean variables to represent each color on each facelet with a new approach of 3 or 2 Boolean variables. In order to be able to solve quickly Rubik's Cube, we replace the direct encoding of 18 turns with the layer encoding of 18-subtype turns based on 6-type turns. To speed up the solving further, we encode some properties of two-phase algorithm as an additional constraint, and restrict some move sequences by adding some constraint clauses. Using only efficient encoding cannot solve this puzzle. For this reason, we improve the existing SAT solvers, and develop a new SAT solver based on PrecoSAT, though it is suited only for Rubik's Cube. The new SAT solver replaces the lookahead solving strategy with an ALO (\emph{at-least-one}) solving strategy, and decomposes the original problem into sub-problems. Each sub-problem is solved by PrecoSAT. The empirical results demonstrate both our SAT translation and new solving technique are efficient. Without the efficient SAT encoding and the new solving technique, Rubik's Cube will not be able to be solved still by any SAT solver. Using the improved SAT solver, we can find always a solution of length 20 in a reasonable time. Although our solver is slower than Kociemba's algorithm using lookup tables, but does not require a huge lookup table.