Optimal Convergence Rate for Exact Policy Mirror Descent in Discounted Markov Decision Processes
–Neural Information Processing Systems
Our study of the convergence of PMD avoids the use of the performance difference lemma, which leads to a direct analysis of independent interest.
Neural Information Processing Systems
Oct-9-2025, 11:29:40 GMT