Regularized Q-learning