Optimal Convergence Rate for Exact Policy Mirror Descent in Discounted Markov Decision Processes

Neural Information Processing Systems 

Our study of the convergence of PMD avoids the use of the performance difference lemma, which leads to a direct analysis of independent interest.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found