Goto

Collaborating Authors

 reasoning step


AutoPSV: Automated Process-Supervised Verifier

Neural Information Processing Systems

This verification model assigns a confidence score to each reasoning step, indicating the probability of arriving at the correct final answer from that point onward.We detect relative changes in the verification's confidence scores across reasoning steps to automatically annotate the reasoning process, enabling error detection even in scenarios where ground truth answers are unavailable. This alleviates the need for numerous manual annotations or the high computational costs associated with model-induced annotation approaches.We experimentally validate that the step-level confidence changes learned by the verification model trained on the final answer correctness can effectively identify errors in the reasoning steps.We demonstrate that the verification model, when trained on process annotations generated by \textsc{AutoPSV}, exhibits improved performance in selecting correct answers from multiple LLM-generated outputs.Notably, we achieve substantial improvements across five datasets in mathematics and commonsense reasoning.




ReST-MCTS: LLM Self-Training via Process Reward Guided Tree Search Dan Zhang

Neural Information Processing Systems

Recent methodologies in LLM self-training mostly rely on LLM generating responses and filtering those with correct output answers as training data. This approach often yields a low-quality fine-tuning training set (e.g., incorrect plans or intermediate reasoning).




Deductive Verification of Chain-of-Thought Reasoning

Neural Information Processing Systems

To facilitate this procedure, we propose Natural Program, a natural language-based deductive reasoning format. Our approach enables models to generate precise reasoning steps where subsequent steps are more rigorously grounded on prior steps.



CanLanguageModels LearntoSkipSteps?

Neural Information Processing Systems

Yet they are still far from true intelligence, which opens up intriguing opportunities to explore the parallels of humans and modelbehaviors.