AutoRubric-R1V: Rubric-Based Generative Rewards for Faithful Multimodal Reasoning

Jia, Mengzhao, Zhang, Zhihan, Cases, Ignacio, Liu, Zheyuan, Jiang, Meng, Qi, Peng

Oct-17-2025–arXiv.org Artificial Intelligence

Multimodal large language models (MLLMs) have rapidly advanced from perception tasks to complex multi-step reasoning, yet reinforcement learning with verifiable rewards (RLVR) often leads to spurious reasoning since only the final-answer correctness is rewarded. To address this limitation, we propose AutoRubric-R1V, a framework that integrates RLVR with process-level supervision through automatically collected rubric-based generative rewards. Our key innovation lies in a scalable self-aggregation method that distills consistent reasoning checkpoints from successful trajectories, enabling problem-specific rubric construction without human annotation or stronger teacher models. By jointly leveraging rubric-based and outcome rewards, AutoRubric-R1V achieves state-of-the-art performance on six multimodal reasoning benchmarks and substantially improves reasoning faithfulness in dedicated evaluations.

large language model, machine learning, trajectory, (20 more...)

arXiv.org Artificial Intelligence

Oct-17-2025

arXiv.org PDF

Add feedback

Country:
- Africa > Rwanda
  - Kigali > Kigali (0.04)
- Asia > Middle East
  - Jordan (0.04)
- Europe
  - Austria > Vienna (0.14)
  - Italy > Lombardy
    - Milan (0.04)
- North America > United States
  - Louisiana > Orleans Parish > New Orleans (0.04)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Cognitive Science > Problem Solving (0.95)
  - Machine Learning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Representation & Reasoning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found