Stabilizing Extreme Q-learning by Maclaurin Expansion

Open in new window