Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizing