Efficient Learning for Entropy-Regularized Markov Decision Processes via Multilevel Monte Carlo