Uncalibrated Reasoning: GRPO Induces Overconfidence for Stochastic Outcomes