Logical Reasoning with Outcome Reward Models for Test-Time Scaling

Open in new window