Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs

Open in new window