Maximum Entropy Reinforcement Learning via Energy-Based Normalizing Flow Chen-Hao Chao 1,2 Wei-Fang Sun 2
–Neural Information Processing Systems
Existing Maximum-Entropy (MaxEnt) Reinforcement Learning (RL) methods for continuous action spaces are typically formulated based on actor-critic frameworks and optimized through alternating steps of policy evaluation and policy improvement. In the policy evaluation steps, the critic is updated to capture the soft Q-function. In the policy improvement steps, the actor is adjusted in accordance with the updated soft Q-function. In this paper, we introduce a new MaxEnt RL framework modeled using Energy-Based Normalizing Flows (EBFlow).
Neural Information Processing Systems
Mar-21-2025, 18:30:38 GMT
- Country:
- North America > United States (0.28)
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (1.00)
- Research Report
- Technology: