Re-FORC: Adaptive Reward Prediction for Efficient Chain-of-Thought Reasoning

Open in new window