The Good, The Bad, and The Hybrid: A Reward Structure Showdown in Reasoning Models Training