SR-GRPO: Stable Rank as an Intrinsic Geometric Reward for Large Language Model Alignment

Open in new window