Smaller Models, Smarter Rewards: A Two-Sided Approach to Process and Outcome Rewards