DreamPRM: Domain-reweighted Process Reward Model for Multimodal Reasoning

Jun-13-2026, 21:34:30 GMT–Neural Information Processing Systems

Reasoning has substantially improved the performance of large language models (LLMs) on complicated tasks. Central to the current reasoning studies, Process Reward Models (PRMs) offer a fine-grained evaluation of intermediate reasoning steps and guide the reasoning process. However, extending PRMs to multimodal large language models (MLLMs) introduces challenges. Since multimodal reasoning covers a wider range of tasks compared to text-only scenarios, the resulting distribution shift from the training to testing sets is more severe, leading to greater generalization difficulty. Training a reliable multimodal PRM, therefore, demands large and diverse datasets to ensure sufficient coverage.

artificial intelligence, large language model, natural language, (12 more...)

Neural Information Processing Systems

Jun-13-2026, 21:34:30 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (0.81)
  - Cognitive Science > Problem Solving (0.58)