An Alternate Policy Gradient Estimator for Softmax Policies

Open in new window