VersaPRM: Multi-Domain Process Reward Model via Synthetic Reasoning Data

Open in new window