Planning in Markov Decision Processes with Gap-Dependent Sample Complexity
–Neural Information Processing Systems
This problem-dependent sample complexity result is expressed in terms of the sub-optimality gaps of the state-action pairs that are visited during exploration.
Neural Information Processing Systems
Oct-2-2025, 01:07:10 GMT