Efficient Diffusion Policies For Offline Reinforcement Learning
–Neural Information Processing Systems
Offline reinforcement learning (RL) aims to learn optimal policies from offline datasets, where the parameterization of policies is crucial but often overlooked. Recently, Diffsuion-QL significantly boosts the performance of offline RL by representing a policy with a diffusion model, whose success relies on a parametrized Markov Chain with hundreds of steps for sampling. However, Diffusion-QL suffers from two critical limitations.
Neural Information Processing Systems
Dec-26-2025, 21:05:25 GMT
- Technology: