Single-Agent Policy Tree Search With Guarantees
Orseau, Laurent, Lelis, Levi, Lattimore, Tor, Weber, Theophane
–Neural Information Processing Systems
We introduce two novel tree search algorithms that use a policy to guide search. The first algorithm is a best-first enumeration that uses a cost function that allows us to provide an upper bound on the number of nodes to be expanded before reaching a goal state. We show that this best-first algorithm is particularly well suited for needle-in-a-haystack'' problems. The second algorithm, which is based on sampling, provides an upper bound on the expected number of nodes to be expanded before reaching a set of goal states. We show that this algorithm is better suited for problems where many paths lead to a goal.
Neural Information Processing Systems
Feb-14-2020, 12:11:55 GMT