Incentivizing Strong Reasoning from Weak Supervision