Shaping Explanations: Semantic Reward Modeling with Encoder-Only Transformers for GRPO