Stabilizing Extreme Q-learning by Maclaurin Expansion