On the Global Convergence Rates of Softmax Policy Gradient Methods

Open in new window