Optimal Convergence Rate for Exact Policy Mirror Descent in Discounted Markov Decision Processes
–Neural Information Processing Systems
Policy Mirror Descent (PMD) is a general family of algorithms that covers a wide range of novel and fundamental methods in reinforcement learning.
Neural Information Processing Systems
Dec-27-2025, 04:39:04 GMT
- Technology: