KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal

Open in new window