Planning in Markov Decision Processes with Gap-Dependent Sample Complexity
–Neural Information Processing Systems
This problem-dependent sample complexity result is expressed in terms of the sub-optimality gaps of the state-action pairs that are visited during exploration.
Neural Information Processing Systems
Dec-27-2025, 23:06:48 GMT