On the Convergence of Policy in Unregularized Policy Mirror Descent

Open in new window