DIME:Diffusion-Based Maximum Entropy Reinforcement Learning

Celik, Onur, Li, Zechu, Blessing, Denis, Li, Ge, Palanicek, Daniel, Peters, Jan, Chalvatzaki, Georgia, Neumann, Gerhard

Feb-4-2025–arXiv.org Artificial Intelligence

Maximum entropy reinforcement learning (MaxEnt-RL) has become the standard approach to RL due to its beneficial exploration properties. Traditionally, policies are parameterized using Gaussian distributions, which significantly limits their representational capacity. Diffusion-based policies offer a more expressive alternative, yet integrating them into MaxEnt-RL poses challenges--primarily due to the intractability of computing their marginal entropy. To overcome this, we propose Diffusion-Based Maximum Entropy RL (DIME). DIME leverages recent advances in approximate inference with diffusion models to derive a lower bound on the maximum entropy objective. Additionally, we propose a policy iteration scheme that provably converges to the optimal diffusion policy. Our method enables the use of expressive diffusion-based policies while retaining the principled exploration benefits of MaxEnt-RL, significantly outperforming other diffusion-based methods on challenging high-dimensional control benchmarks. It is also competitive with state-of-the-art non-diffusion based RL methods while requiring fewer algorithmic design choices and smaller update-to-data ratios, reducing computational complexity.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

Feb-4-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.29)
- Europe > Germany (0.28)

Genre:
- Research Report (0.40)

Industry:
- Education (0.46)
- Energy > Oil & Gas
  - Upstream (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning > Maximum Entropy (1.00)
  - Reinforcement Learning (1.00)
  - Neural Networks > Deep Learning (0.67)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found