Maximum Entropy Reinforcement Learning via Energy-Based Normalizing Flow

Mar-20-2026, 22:56:37 GMT–Neural Information Processing Systems

Existing Maximum-Entropy (MaxEnt) Reinforcement Learning (RL) methods for continuous action spaces are typically formulated based on actor-critic frameworks and optimized through alternating steps of policy evaluation and policy improvement. In the policy evaluation steps, the critic is updated to capture the soft Q-function. In the policy improvement steps, the actor is adjusted in accordance with the updated soft Q-function. In this paper, we introduce a new MaxEnt RL framework modeled using Energy-Based Normalizing Flows (EBFlow).

artificial intelligence, machine learning, reinforcement learning, (6 more...)

Neural Information Processing Systems

Mar-20-2026, 22:56:37 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.30)