Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model

Mar-17-2026, 00:05:03 GMT–Neural Information Processing Systems

In this paper we consider the problem of computing an $\epsilon$-optimal policy of a discounted Markov Decision Process (DMDP) provided we can only access its transition function through a generative sampling model that given any state-action pair samples from the transition function in $O(1)$ time.

artificial intelligence, machine learning, proceedings, (7 more...)

Neural Information Processing Systems

Mar-17-2026, 00:05:03 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.43)