Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model
Aaron Sidford, Mengdi Wang, Xian Wu, Lin Yang, Yinyu Ye
–Neural Information Processing Systems
In this paper we consider the problem of computing an null -optimal policy of a discounted Markov Decision Process (DMDP) provided we can only access its transition function through a generative sampling model that given any state-action pair samples from the transition function in O (1) time.
Neural Information Processing Systems
Nov-20-2025, 23:39:19 GMT
- Country:
- Europe > United Kingdom
- England > Greater London > London (0.04)
- North America
- Canada (0.04)
- United States
- Europe > United Kingdom