GenPO Generative Diffusion Models Meet On Policy Reinforcement Learning

Jun-22-2026, 11:56:59 GMT–Neural Information Processing Systems

Recent advances in reinforcement learning (RL) have demonstrated the powerful exploration capabilities and multimodality of generative diffusion-based policies. While substantial progress has been made in offline RL and off-policy RL settings, integrating diffusion policies into on-policy frameworks like PPO remains underexplored. This gap is particularly significant given the widespread use of large-scale parallel GPU-accelerated simulators, such as IsaacLab, which are optimized for on-policy RL algorithms and enable rapid training of complex robotic tasks. A key challenge lies in computing state-action log-likelihoods under diffusion policies, which is straightforward for Gaussian policies but intractable for flow-based models due to irreversible forward-reverse processes and discretization errors (e.g., EulerMaruyama approximations). To bridge this gap, we propose GenPO, a generative policy optimization framework that leverages exact diffusion inversion to construct invertible action mappings.

arxiv preprint arxiv, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Jun-22-2026, 11:56:59 GMT

Conferences PDF

Add feedback

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.67)

Industry:
- Leisure & Entertainment (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found