On the Convergence of Policy Mirror Descent with Temporal Difference Evaluation

Open in new window