mdp-gape
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.69)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.41)
- North America > Canada > Ontario > Toronto (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Planning in Markov Decision Processes with Gap-Dependent Sample Complexity
We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for planning in a Markov Decision Process in which transitions have a finite support. We prove an upper bound on the number of sampled trajectories needed for MDP-GapE to identify a near-optimal action with high probability. This problem-dependent result is expressed in terms of the sub-optimality gaps of the state-action pairs that are visited during exploration. Our experiments reveal that MDP-GapE is also effective in practice, in contrast with other algorithms with sample complexity guarantees in the fixed-confidence setting, that are mostly theoretical.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.69)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.41)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.71)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.41)
Planning in Markov Decision Processes with Gap-Dependent Sample Complexity
We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for planning in a Markov Decision Process in which transitions have a finite support. We prove an upper bound on the number of sampled trajectories needed for MDP-GapE to identify a near-optimal action with high probability. This problem-dependent result is expressed in terms of the sub-optimality gaps of the state-action pairs that are visited during exploration. Our experiments reveal that MDP-GapE is also effective in practice, in contrast with other algorithms with sample complexity guarantees in the fixed-confidence setting, that are mostly theoretical.
Planning in Markov Decision Processes with Gap-Dependent Sample Complexity
Jonsson, Anders, Kaufmann, Emilie, Ménard, Pierre, Domingues, Omar Darwiche, Leurent, Edouard, Valko, Michal
We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for planning in a Markov Decision Process in which transitions have a finite support. We prove an upper bound on the number of calls to the generative models needed for MDP-GapE to identify a near-optimal action with high probability. This problem-dependent sample complexity result is expressed in terms of the sub-optimality gaps of the state-action pairs that are visited during exploration. Our experiments reveal that MDP-GapE is also effective in practice, in contrast with other algorithms with sample complexity guarantees in the fixed-confidence setting, that are mostly theoretical.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.89)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.86)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.61)