Advantage Shaping as Surrogate Reward Maximization: Unifying Pass@K Policy Gradients