DistributionalReinforcementLearningfor Risk-SensitivePolicies

Neural Information Processing Systems 

On both synthetic and real data, we empirically show that our proposed algorithm is able to learn better CVaRoptimizedpolicies.