Action Gaps and Advantages in Continuous-Time Distributional Reinforcement Learning Harley Wiltzer Mila-Québec AI Institute McGill University Marc G. Bellemare

Neural Information Processing Systems 

In addition, we build a superiority-based DRL algorithm.