Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence

Open in new window