GRAM-R$^2$: Self-Training Generative Foundation Reward Models for Reward Reasoning
Wang, Chenglong, Mu, Yongyu, Zhou, Hang, Huo, Yifu, Zhu, Ziming, Zeng, Jiali, Yang, Murun, Li, Bei, Hao, Xiaoyang, Zhang, Chunliang, Meng, Fandong, Zhu, Jingbo, Xiao, Tong
–arXiv.org Artificial Intelligence
Significant progress in reward modeling over recent years has been driven by a paradigm shift from task-specific designs towards generalist reward models. Despite this trend, developing effective reward models remains a fundamental challenge: the heavy reliance on large-scale labeled preference data. Pre-training on abundant unlabeled data offers a promising direction, but existing approaches fall short of instilling explicit reasoning into reward models. To bridge this gap, we propose a self-training approach that leverages unlabeled data to elicit reward reasoning in reward models. Based on this approach, we develop GRAM-R$^2$, a generative reward model trained to produce not only preference labels but also accompanying reward rationales. GRAM-R$^2$ can serve as a foundation model for reward reasoning and can be applied to a wide range of tasks with minimal or no additional fine-tuning. It can support downstream applications such as response ranking and task-specific reward tuning. Experiments on response ranking, task adaptation, and reinforcement learning from human feedback demonstrate that GRAM-R$^2$ consistently delivers strong performance, outperforming several strong discriminative and generative baselines.
arXiv.org Artificial Intelligence
Nov-18-2025
- Country:
- Asia
- China > Liaoning Province
- Shenyang (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- South Korea > Seoul
- Seoul (0.04)
- China > Liaoning Province
- Europe
- North America
- Canada
- Alberta > Census Division No. 19
- Saddle Hills County (0.04)
- British Columbia > Vancouver (0.04)
- Alberta > Census Division No. 19
- Mexico > Mexico City
- Mexico City (0.04)
- United States
- California > Santa Clara County
- Palo Alto (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Washington > King County
- Seattle (0.14)
- California > Santa Clara County
- Canada
- Asia
- Genre:
- Research Report > New Finding (0.67)
- Industry:
- Law (0.67)
- Technology: