SoftTreeMax: Exponential Variance Reduction in Policy Gradient via Tree Search

Open in new window