Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning
Grill, Jean-Bastien, Valko, Michal, Munos, Remi
–Neural Information Processing Systems
We study the sampling-based planning problem in Markov decision processes (MDPs) that we can access only through a generative model, usually referred to as Monte-Carlo planning. Our objective is to return a good estimate of the optimal value function at any state while minimizing the number of calls to the generative model, i.e. the sample complexity. We propose a new algorithm, TrailBlazer, able to handle MDPs with a finite or an infinite number of transitions from state-action to next states. TrailBlazer is an adaptive algorithm that exploits possible structures of the MDP by exploring only a subset of states reachable by following near-optimal policies. We provide bounds on its sample complexity that depend on a measure of the quantity of near-optimal states.
Neural Information Processing Systems
Feb-14-2020, 16:26:45 GMT
- Technology: