Distributional Reinforcement Learning for Multi-Dimensional Reward Functions

Neural Information Processing Systems 

We prove the convergence for the joint distributional Bellman operator and build our empirical algorithm by minimizing the Maximum Mean Discrepancy between joint return distribution and its Bellman target.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found