Diffusion Policies Creating a Trust Region for Offline Reinforcement Learning

Neural Information Processing Systems 

Offline reinforcement learning (RL) leverages pre-collected datasets to train optimal policies. Diffusion Q-Learning (DQL), introducing diffusion models as a powerful and expressive policy class, significantly boosts the performance of offline RL. However, its reliance on iterative denoising sampling to generate actions slows down both training and inference.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found