Conditional Advantage Estimation for Reinforcement Learning in Large Reasoning Models