Aggregating Optimistic Planning Trees for Solving Markov Decision Processes
Kedenburg, Gunnar, Fonteneau, Raphael, Munos, Remi
–Neural Information Processing Systems
This paper addresses the problem of online planning in Markov decision processes using a randomized simulator, under a budget constraint. We propose a new algorithm which is based on the construction of a forest of planning trees, where each tree corresponds to a random realization of the stochastic environment. The trees are constructed using a "safe" optimistic planning strategy combining the optimistic principle (in order to explore the most promising part of the search space first) with a safety principle (which guarantees a certain amount of uniform exploration). In the decision-making step of the algorithm, the individual trees are aggregated and an immediate action is recommended. We provide a finite-sample analysis and discuss the tradeoff between the principles of optimism and safety. We also report numerical results on a benchmark problem. Our algorithm performs as well as state-of-the-art optimistic planning algorithms, and better than a related algorithm which additionally assumes the knowledge of all transition distributions.
Neural Information Processing Systems
Dec-31-2013