VersaPRM: Multi-Domain Process Reward Model via Synthetic Reasoning Data