Entropy-regularized Diffusion Policy with Q-Ensembles for Offline Reinforcement Learning

Neural Information Processing Systems 

Left: The reward function is a mixture of Gaussian, and the offline data distribution is unbalanced with most samples located in low-reward states.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found