Maximum Entropy Reinforcement Learning via Energy-Based Normalizing Flow

Neural Information Processing Systems 

In the policy evaluation steps, the critic is updated to capture the soft Q-function. In the policy improvement steps, the actor is adjusted in accordance with the updated soft Q-function.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found