Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model
Aaron Sidford, Mengdi Wang, Xian Wu, Lin Yang, Yinyu Ye
–Neural Information Processing Systems
In this paper we consider the problem of computing an ɛ-optimal policy of a discounted Markov Decision Process (DMDP) provided we can only access its transition function through a generative sampling model that given any state-action pair samples from the transition function in O(1) time.
Neural Information Processing Systems
Oct-8-2024, 02:08:11 GMT