Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning

Jean-Bastien Grill, Michal Valko, Remi Munos

Neural Information Processing Systems 

You are a robot and you live in a Markov decision process (MDP) with a finite or an infinite number of transitions from state-action to next states. You got brains and so you plan before you act. Luckily, your roboparents equipped you with a generative model to do some Monte-Carlo planning. The world is waiting for you and you have no time to waste. You want your planning to be efficient.