An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models

Wu, Yangzhen, Sun, Zhiqing, Li, Shanda, Welleck, Sean, Yang, Yiming

Aug-1-2024–arXiv.org Artificial Intelligence

These studies have demonstrated how model performance is influenced by both the size of the model and the amount of training computation. However, there is limited knowledge on how varying the compute during inference affects model performance after the model has been trained. To improve the task performance of large language models (LLMs), inference techniques typically involve additional computation as a performance maximization step at inference time [Nye et al., 2021, Wei et al., 2022, Wang et al., 2022b, Yao et al., 2023, Chen et al., 2024b]. This cost must be taken into account for compute-optimal inference. For example, a Monte Carlo Tree Search (MCTS) method [Jones, 2021] may improve task performance, but potentially require much more compute than simply sampling solutions multiple times. Generally speaking, we need a comprehensive understanding of how various inference-time methods (e.g., Best-of-N, Majority Voting) trade off between performance and cost. To improve our understanding, this paper presents a thorough empirical evaluation with careful analysis over various configurations of representative LLMs and inference algorithms. Specifically, we explore how to select an optimal size for the language model and an effective inference strategy (e.g., Greedy Search, Majority Voting, Best-of-N, Weighted Voting, and their Tree Search variants) to maximize performance (i.e., accuracy) with a given compute budget.

arxiv preprint arxiv, compute-optimal inference, majority voting, (14 more...)

arXiv.org Artificial Intelligence

Aug-1-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Virginia (0.04)
  - Pennsylvania > Allegheny County
    - Pittsburgh (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report > New Finding (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Search (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.68)