Distributional Reinforcement Learning for Multi-Dimensional Reward Functions
–Neural Information Processing Systems
We prove the convergence for the joint distributional Bellman operator and build our empirical algorithm by minimizing the Maximum Mean Discrepancy between joint return distribution and its Bellman target.
Neural Information Processing Systems
Dec-27-2025, 23:07:04 GMT