Reviews: Distributional Reward Decomposition for Reinforcement Learning

Neural Information Processing Systems 

The submission introduces a method for distributional reward decomposition which is more generally applicable than prior work, removing requirements for arbitrary resets as well as domain knowledge. To further strengthen disentanglement the objective is extended to maximise the KL divergence between the distributions resulting from actions optimising for different subrewards (treating the learned Q functions as epsilon greedy policies). Overall, the work provides a valuable contribution to RL by investigating (and benefitting from) reward decomposition in a distributional setting. The combination of reward decomposition and distributional RL provides novelty and as demonstrated in the experimental section better agent performance by exploiting task structure. It would be interesting in this context to see how the approach fares in tasks with only a single source of reward and potential situations where the method might perform worse than the baseline.