GRAM-R$^2$: Self-Training Generative Foundation Reward Models for Reward Reasoning

Open in new window