CoLD: Counterfactually-Guided Length Debiasing for Process Reward Models

Open in new window