Flow-Based Policy for Online Reinforcement Learning

Jun-13-2026, 03:51:39 GMT–Neural Information Processing Systems

We argue that in addition to training signals, enhancing the expressiveness of the policy class is crucial for the performance gains in RL. Flow-based generative models offer such potential, excelling at capturing complex, multimodal action distributions. However, their direct application in online RL is challenging due to a fundamental objective mismatch: standard flow training optimizes for static data imitation, while RL requires value-based policy optimization through a dynamic buffer, leading to difficult optimization landscapes.

artificial intelligence, machine learning, proceedings, (7 more...)

Neural Information Processing Systems

Jun-13-2026, 03:51:39 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.43)