Entropy-regularized Diffusion Policy with Q-Ensembles for Offline Reinforcement Learning
–Neural Information Processing Systems
Left: The reward function is a mixture of Gaussian, and the offline data distribution is unbalanced with most samples located in low-reward states.
Neural Information Processing Systems
Feb-17-2026, 14:16:46 GMT
- Country:
- Europe
- Sweden
- Uppsala County > Uppsala (0.04)
- Östergötland County > Linköping (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Sweden
- Europe
- Genre:
- Research Report
- Experimental Study (0.93)
- New Finding (0.68)
- Research Report
- Technology: