DPRM: A Dual Implicit Process Reward Model in Multi-Hop Question Answering

Open in new window