GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning

Open in new window