Reinforcement learning fine-tuning of language model for instruction following and math reasoning