Non-crossing quantile regression for deep reinforcement learning

Neural Information Processing Systems 

Distributional reinforcement learning (DRL) estimates the distribution over future returns instead of the mean to more efficiently capture the intrinsic uncertainty of MDPs.