Proof Flow: Preliminary Study on Generative Flow Network Language Model Tuning for Formal Reasoning

Ho, Matthew, Zhu, Vincent, Chen, Xiaoyin, Jain, Moksh, Malkin, Nikolay, Zhang, Edwin

Oct-17-2024–arXiv.org Artificial Intelligence

Reasoning is a fundamental substrate for solving novel and complex problems. Deliberate efforts in learning and developing frameworks around System 2 reasoning have made great strides, yet problems of sufficient complexity remain largely out of reach for open models. To address this gap, we examine the potential of Generative Flow Networks [GFlowNets; Bengio et al., 2021, Hu et al., 2024] as a fine-tuning method for LLMs to unlock advanced reasoning capabilities. In this paper, we present a proof of concept in the domain of formal reasoning, specifically in the Neural Theorem Proving (NTP) setting, where proofs specified in a formal language such as Lean can be deterministically and objectively verified. Unlike classical reward-maximization reinforcement learning, which frequently over-exploits high-reward actions and fails to effectively explore the state space, GFlowNets have emerged as a promising approach for sampling compositional objects, improving generalization, and enabling models to maintain diverse hypotheses. Our early results demonstrate GFlowNet fine-tuning's potential for enhancing model performance in a search setting, which is especially relevant given the paradigm shift towards inference time compute scaling and "thinking slowly."

large language model, machine learning, trajectory, (15 more...)

arXiv.org Artificial Intelligence

Oct-17-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States > California (0.68)

Genre:
- Research Report
  - New Finding (0.34)
  - Promising Solution (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (1.00)
  - Natural Language > Large Language Model (0.69)
  - Representation & Reasoning (1.00)