Selftok-Zero: Reinforcement Learning for Visual Generation via Discrete and Autoregressive Visual Tokens

Jun-13-2026, 19:07:23 GMT–Neural Information Processing Systems

Reinforcement learning (RL) has become an indispensable post-training step for unlocking the full potential of Large Language Models (LLMs). Its core motivation is to incentivize the model's inference trajectory via a reward model, effectively balancing the exploration-exploitation trade-off in scenarios where collecting exhaustive input-output ground-truth pairs is infeasible. This motivation naturally extends to visual generation, where perfect alignment between an image and a textual prompt is inherently ambiguous and often unattainable. However, existing visual generative models are not yet ready for RL due to the following two fundamental drawbacks that undermine the foundations of RL: 1) For diffusion-based models, the actual generation trajectories of sampled images cannot be reliably rewarded, as diffusion inversion is notoriously difficult.

large language model, natural language, proceedings, (5 more...)

Neural Information Processing Systems

Jun-13-2026, 19:07:23 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.59)