An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models
Wu, Yangzhen, Sun, Zhiqing, Li, Shanda, Welleck, Sean, Yang, Yiming
–arXiv.org Artificial Intelligence
These studies have demonstrated how model performance is influenced by both the size of the model and the amount of training computation. However, there is limited knowledge on how varying the compute during inference affects model performance after the model has been trained. To improve the task performance of large language models (LLMs), inference techniques typically involve additional computation as a performance maximization step at inference time [Nye et al., 2021, Wei et al., 2022, Wang et al., 2022b, Yao et al., 2023, Chen et al., 2024b]. This cost must be taken into account for compute-optimal inference. For example, a Monte Carlo Tree Search (MCTS) method [Jones, 2021] may improve task performance, but potentially require much more compute than simply sampling solutions multiple times. Generally speaking, we need a comprehensive understanding of how various inference-time methods (e.g., Best-of-N, Majority Voting) trade off between performance and cost. To improve our understanding, this paper presents a thorough empirical evaluation with careful analysis over various configurations of representative LLMs and inference algorithms. Specifically, we explore how to select an optimal size for the language model and an effective inference strategy (e.g., Greedy Search, Majority Voting, Best-of-N, Weighted Voting, and their Tree Search variants) to maximize performance (i.e., accuracy) with a given compute budget.
arXiv.org Artificial Intelligence
Aug-1-2024
- Country:
- Asia > Middle East
- Jordan (0.04)
- North America > United States
- Pennsylvania > Allegheny County
- Pittsburgh (0.04)
- Virginia (0.04)
- Pennsylvania > Allegheny County
- Asia > Middle East
- Genre:
- Research Report > New Finding (1.00)
- Technology: