Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling