Double Gumbel Q-Learning

Apr-24-2026, 11:10:02 GMT–Neural Information Processing Systems

We show that Deep Neural Networks introduce two heteroscedastic Gumbel noise sources into Q-Learning. To account for these noise sources, we propose Double Gumbel Q-Learning, a Deep Q-Learning algorithm applicable for both discrete and continuous control. In discrete control, we derive a closed-form expression for the loss function of our algorithm. In continuous control, this loss function is intractable and we therefore derive an approximation with a hyperparameter whose value regulates pessimism in Q-Learning. We present a default value for our pessimism hyperparameter that enables DoubleGum to outperform DDPG, TD3, SAC, XQL, quantile regression, and Mixture-of-Gaussian Critics in aggregate over 33 tasks from DeepMind Control, MuJoCo, MetaWorld, and Box2D and show that tuning this hyperparameter may further improve sample efficiency.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Apr-24-2026, 11:10:02 GMT

Conferences PDF

Add feedback

Country:
- North America > Canada (0.28)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.28)

Genre:
- Research Report > New Finding (0.45)

Industry:
- Leisure & Entertainment (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Neural Networks > Deep Learning (0.86)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.46)

Duplicate Docs Excel Report

Title
07956d40074d6523bad11112b3225c6e-Paper-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found