Rational Metareasoning for Large Language Models

De Sabbata, C. Nicolò, Sumers, Theodore R., Griffiths, Thomas L.

arXiv.org Artificial Intelligence 

Being prompted to engage in reasoning has emerged as a core technique for using large language models (LLMs), deploying additional inference-time compute to improve task performance. However, as LLMs increase in both size and adoption, inference costs are correspondingly becoming increasingly burdensome. This work introduces a novel approach based on computational models of metareasoning used in cognitive science, training LLMs to selectively use intermediate reasoning steps only when necessary. We first develop a reward function that incorporates the Value of Computation by penalizing unnecessary reasoning, then use this reward function with Expert Iteration to train the LLM. Compared to few-shot chain-of-thought prompting and STaR, our method significantly reduces inference costs (20-37% fewer tokens generated across three models) while maintaining task performance across diverse datasets. Large language models (LLMs) rely on substantial computational power to handle complex problems (OpenAI et al., 2024; Chowdhery et al., 2022; de Vries, 2023). While initial studies mostly focused on the cost of training (Verdecchia et al., 2023), LLMs' widespread deployment has made inference-time costs an increasingly important factor. However, there is a fundamental tension between inference cost and task performance: while many of these methods reduce costs at the expense of performance, others, such as chain-of-thought prompting (CoT; Wei et al., 2023; Kojima et al., 2023), do the opposite, raising inference costs to enhance task performance (Snell et al., 2024). It is worth noting that none of the previous approaches are adaptive: model compression modifications and existing CoT methods tend to raise or lower the inference cost on all queries, regardless of task complexity.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found