Reviews: Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model

Oct-8-2024, 02:08:13 GMT–Neural Information Processing Systems

In this paper the authors present PAC-RL results for finite-spaces discounted MDP with a generative sampling model (which can generate in O(1) time a state transition for each given state-action pair). They introduce a value-iteration type algorithm for finding an \epsilon -optimal policy, and prove that its running time and sample complexity are nearly the best possible and match, up to logarithmic factors, the lower bounds established previously in [AMK13]. These results are significant: They improve over a recent result in [SWWY18] by a factor of (1 - \gamma) {-1} ( \gamma is the discount factor), and they also fill a known gap in the earlier results of [AMK13] regarding the sample complexity of finding a near-optimal policy. Both the algorithm and its analysis are very interesting. They build upon the prior work [AMK13] and [SWWY18], and combine the algorithmic ideas and proof techniques from both papers in a highly non-trivial and original way.

inequality, markov decision process, near-optimal time and sample complexity, (9 more...)

Neural Information Processing Systems

Oct-8-2024, 02:08:13 GMT

Conferences Web Page

Add feedback

Genre:
- Summary/Review (0.36)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Generation (0.40)
  - Machine Learning > Learning Graphical Models
    - Undirected Networks > Markov Models (0.51)