Optimal Convergence Rate for Exact Policy Mirror Descent in Discounted Markov Decision Processes

Dec-27-2025, 04:39:04 GMT–Neural Information Processing Systems

Policy Mirror Descent (PMD) is a general family of algorithms that covers a wide range of novel and fundamental methods in reinforcement learning.

exact policy mirror descent, optimal convergence rate, policy mirror descent, (9 more...)

Neural Information Processing Systems

Dec-27-2025, 04:39:04 GMT

Conferences Web Page

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.59)