Aggregating Optimistic Planning Trees for Solving Markov Decision Processes

Kedenburg, Gunnar, Fonteneau, Raphael, Munos, Remi

Neural Information Processing Systems 

This paper addresses the problem of online planning in Markov Decision Processes using only a generative model. We propose a new algorithm which is based on the construction of a forest of single successor state planning trees. For every explored state-action, such a tree contains exactly one successor state, drawn from the generative model. The trees are built using a planning algorithm which follows the optimism in the face of uncertainty principle, in assuming the most favorable outcome in the absence of further information. In the decision making step of the algorithm, the individual trees are combined.