Softmax Policy Gradient Methods Can Take Exponential Time to Converge

Open in new window