Beyond Token-level Supervision: Unlocking the Potential of Decoding-based Regression via Reinforcement Learning