Model Model Computation Policy Reward Group Policy Update NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation

Open in new window