Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning

Grill, Jean-Bastien, Valko, Michal, Munos, Remi

Feb-14-2020, 16:26:45 GMT–Neural Information Processing Systems

We study the sampling-based planning problem in Markov decision processes (MDPs) that we can access only through a generative model, usually referred to as Monte-Carlo planning. Our objective is to return a good estimate of the optimal value function at any state while minimizing the number of calls to the generative model, i.e. the sample complexity. We propose a new algorithm, TrailBlazer, able to handle MDPs with a finite or an infinite number of transitions from state-action to next states. TrailBlazer is an adaptive algorithm that exploits possible structures of the MDP by exploring only a subset of states reachable by following near-optimal policies. We provide bounds on its sample complexity that depend on a measure of the quantity of near-optimal states.

blazing, sample-efficient monte-carlo planning, trailblazer, (4 more...)

Neural Information Processing Systems

Feb-14-2020, 16:26:45 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)