Regularized Q-Learning