Diffusion-based ReinforcementLearningvia Q-weightedVariationalPolicyOptimization
–Neural Information Processing Systems
UnlikeGaussian policies, the log-likelihood indiffusion policies isinaccessible; thus this entropy term is nontrivial. Moreover, to reduce the large variance of diffusion policies, we also develop an efficient behavior policy through action selection. This can further improve its sample efficiency during online interaction.
Neural Information Processing Systems
Feb-15-2026, 04:47:43 GMT