Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning

Neural Information Processing Systems 

To this end, this paper proposes UNIFIEDREWARD-THINK, the first unified multimodal CoT-based reward model, capable of multi-dimensional, step-by-step long-chain reasoning for both visual understanding and generation reward tasks. Specifically, we adopt an exploration-driven reinforcement finetuning approach to elicit and incentivize the model's latent complex reasoning

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found