A Local Temporal Difference Code for Distributional Reinforcement Learning Pablo T ano

Neural Information Processing Systems 

These results naturally pose the question of how such distributional estimators are learned and represented by neural systems.