Accurate and Diverse LLM Mathematical Reasoning via Automated PRM-Guided GFlowNets
Younsi, Adam, Attia, Ahmed, Abubaker, Abdalgader, Seddik, Mohamed El Amine, Hacid, Hakim, Lahlou, Salem
–arXiv.org Artificial Intelligence
Achieving both accuracy and diverse reasoning remains challenging for Large Language Models (LLMs) in complex domains like mathematics. A key bottleneck is evaluating intermediate reasoning steps to guide generation without costly human annotations. To address this, we first introduce a novel Process Reward Model (PRM) trained automatically using Monte Carlo Tree Search coupled with a similarity-based data augmentation technique, effectively capturing step-level reasoning quality. Leveraging this PRM, we then adapt Generative Flow Networks (GFlowNets) to operate at the reasoning step level. Unlike traditional reinforcement learning focused on maximizing a single reward, GFlowNets naturally sample diverse, high-quality solutions proportional to their rewards, as measured by our PRM. Empirical evaluation shows strong improvements in both accuracy and solution diversity on challenging mathematical benchmarks (e.g., +2.59% absolute accuracy on MATH Level 5 for Llama3.2-3B), with effective generalization to unseen datasets (+9.4\% absolute on SAT MATH). Furthermore, we benchmark our PRM against existing open-source reward models, demonstrating superior alignment with reasoning quality and more consistent guidance for downstream generation. Our work demonstrates the potential of PRM-guided, step-level GFlowNets for developing more robust and versatile mathematical reasoning in LLMs.
arXiv.org Artificial Intelligence
Oct-14-2025
- Country:
- Asia
- Middle East > UAE (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- North America > United States
- Hawaii > Honolulu County
- Honolulu (0.04)
- Maryland > Baltimore (0.04)
- Hawaii > Honolulu County
- Asia
- Genre:
- Research Report (1.00)
- Workflow (1.00)
- Technology: