Optimal Convergence Rate for Exact Policy Mirror Descent in Discounted Markov Decision Processes

Neural Information Processing Systems 

Policy Mirror Descent (PMD) is a general family of algorithms that covers a wide range of novel and fundamental methods in reinforcement learning.