Convergence of policy gradient for entropy regularized MDPs with neural network approximation in the mean-field regime

Open in new window