FreePRM: Training Process Reward Models Without Ground Truth Process Labels

Open in new window