Maximum Entropy Reinforcement Learning via Energy-Based Normalizing Flow
–Neural Information Processing Systems
In the policy evaluation steps, the critic is updated to capture the soft Q-function. In the policy improvement steps, the actor is adjusted in accordance with the updated soft Q-function.
Neural Information Processing Systems
Feb-15-2026, 12:20:42 GMT
- Country:
- Asia
- Middle East > Jordan (0.04)
- Taiwan (0.04)
- Europe > Portugal
- North America > United States
- California > Santa Clara County > Santa Clara (0.04)
- Asia
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (1.00)
- Research Report
- Technology: