Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning

Jun-23-2026, 00:59:38 GMT–Neural Information Processing Systems

To this end, this paper proposes UNIFIEDREWARD-THINK, the first unified multimodal CoT-based reward model, capable of multi-dimensional, step-by-step long-chain reasoning for both visual understanding and generation reward tasks. Specifically, we adopt an exploration-driven reinforcement finetuning approach to elicit and incentivize the model's latent complex reasoning

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Jun-23-2026, 00:59:38 GMT

Conferences PDF

Add feedback

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.67)

Industry:
- Leisure & Entertainment > Sports > Tennis (0.93)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (0.69)
  - Artificial Intelligence
    - Vision (1.00)
    - Representation & Reasoning (1.00)
    - Natural Language
      - Large Language Model (0.94)
      - Chatbot (0.68)
    - Machine Learning > Neural Networks
      - Deep Learning (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found