oga-uct
Grouping Nodes With Known Value Differences: A Lossless UCT-based Abstraction Algorithm
Schmöcker, Robin, Dockhorn, Alexander, Rosenhahn, Bodo
A core challenge of Monte Carlo Tree Search (MCTS) is its sample efficiency, which can be improved by grouping state-action pairs and using their aggregate statistics instead of single-node statistics. On the Go Abstractions in Upper Confidence bounds applied to Trees (OGA-UCT) is the state-of-the-art MCTS abstraction algorithm for deterministic environments that builds its abstraction using the Abstractions of State-Action Pairs (ASAP) framework, which aims to detect states and state-action pairs with the same value under optimal play by analysing the search graph. ASAP, however, requires two state-action pairs to have the same immediate reward, which is a rigid condition that limits the number of abstractions that can be found and thereby the sample efficiency. In this paper, we break with the paradigm of grouping value-equivalent states or state-action pairs and instead group states and state-action pairs with possibly different values as long as the difference between their values can be inferred. We call this abstraction framework Known Value Difference Abstractions (KVDA), which infers the value differences by analysis of the immediate rewards and modifies OGA-UCT to use this framework instead. The modification is called KVDA-UCT, which detects significantly more abstractions than OGA-UCT, introduces no additional parameter, and outperforms OGA-UCT on a variety of deterministic environments and parameter settings.
Time-critical and confidence-based abstraction dropping methods
Schmöcker, Robin, Kampmann, Lennart, Dockhorn, Alexander
One paradigm of Monte Carlo Tree Search (MCTS) improvements is to build and use state and/or action abstractions during the tree search. Non-exact abstractions, however, introduce an approximation error making convergence to the optimal action in the abstract space impossible. Hence, as proposed as a component of Elastic Monte Carlo Tree Search by Xu et al., abstraction algorithms should eventually drop the abstraction. In this paper, we propose two novel abstraction dropping schemes, namely OGA-IAAD and OGA-CAD which can yield clear performance improvements whilst being safe in the sense that the dropping never causes any notable performance degradations contrary to Xu's dropping method. OGA-IAAD is designed for time critical settings while OGA-CAD is designed to improve the MCTS performance with the same number of iterations.
OGA-UCT: On-the-Go Abstractions in UCT
Anand, Ankit (Indian Institute of Technology, Delhi) | Noothigattu, Ritesh (Indian Institute of Technology, Delhi) | ., Mausam (Indian Institute of Technology, Delhi) | Singla, Parag (Indian Institute of Technology, Delhi)
Recent work has begun exploring the value of domain abstractions in Monte-Carlo Tree Search (MCTS) algorithms for probabilistic planning. These algorithms automatically aggregate symmetric search nodes (states or state-action pairs) saving valuable planning time. Existing algorithms alternate between two phases: (1) abstraction computation forcomputing node aggregations, and (2) modified MCTS that use aggregate nodes. We believe that these algorithms do not achieve the full potential of abstractions because of disjoint phases – e.g., it can take a while to recover from erroneous abstractions, or compute better abstractions based on newly found knowledge.In response, we propose On-the-Go Abstractions (OGA), a novel approach in which abstraction computation is tightlyintegrated into the MCTS algorithm. We implement these on top of UCT and name the resulting algorithm OGA-UCT.It has several desirable properties, including (1) rapid use of new information in modifying existing abstractions, (2) elimination of the expensive batch abstraction computationphase, and (3) focusing abstraction computation on important part of the sampled search space. We experimentally compare OGA-UCT against ASAP-UCT, a recent state-of-the-art MDP algorithm as well as vanilla UCT algorithm. We find that OGA-UCT is robust across a suite of planning competition and other MDP domains, and obtains up to 28 % quality improvements.