Maximum Entropy Reinforcement Learning via Energy-Based Normalizing Flow

May-27-2025, 03:47:07 GMT–Neural Information Processing Systems

Existing Maximum-Entropy (MaxEnt) Reinforcement Learning (RL) methods for continuous action spaces are typically formulated based on actor-critic frameworks and optimized through alternating steps of policy evaluation and policy improvement. In the policy evaluation steps, the critic is updated to capture the soft Q-function. In the policy improvement steps, the actor is adjusted in accordance with the updated soft Q-function. In this paper, we introduce a new MaxEnt RL framework modeled using Energy-Based Normalizing Flows (EBFlow). Our method enables the calculation of the soft value function used in the policy evaluation target without Monte Carlo approximation.

energy-based normalizing flow, machine learning, reinforcement learning, (5 more...)

Neural Information Processing Systems

May-27-2025, 03:47:07 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning > Maximum Entropy (0.65)
  - Reinforcement Learning (0.65)