Convergence of policy gradient for entropy regularized MDPs with neural network approximation in the mean-field regime