Reviews: Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model
–Neural Information Processing Systems
In this paper the authors present PAC-RL results for finite-spaces discounted MDP with a generative sampling model (which can generate in O(1) time a state transition for each given state-action pair). They introduce a value-iteration type algorithm for finding an \epsilon -optimal policy, and prove that its running time and sample complexity are nearly the best possible and match, up to logarithmic factors, the lower bounds established previously in [AMK13]. These results are significant: They improve over a recent result in [SWWY18] by a factor of (1 - \gamma) {-1} ( \gamma is the discount factor), and they also fill a known gap in the earlier results of [AMK13] regarding the sample complexity of finding a near-optimal policy. Both the algorithm and its analysis are very interesting. They build upon the prior work [AMK13] and [SWWY18], and combine the algorithmic ideas and proof techniques from both papers in a highly non-trivial and original way.
Neural Information Processing Systems
Oct-8-2024, 02:08:13 GMT