Variance-reduced $Q$-learning is minimax optimal

Open in new window