Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model
–Neural Information Processing Systems
In this paper we consider the problem of computing an $\epsilon$-optimal policy of a discounted Markov Decision Process (DMDP) provided we can only access its transition function through a generative sampling model that given any state-action pair samples from the transition function in $O(1)$ time.
Neural Information Processing Systems
Dec-26-2025, 00:57:02 GMT
- Technology: