Shaping Explanations: Semantic Reward Modeling with Encoder-Only Transformers for GRPO

Open in new window