Revisiting the Scaling Properties of Downstream Metrics in Large Language Model Training
Krajewski, Jakub, Shidani, Amitis, Busbridge, Dan, Wiseman, Sam, Ramapuram, Jason
–arXiv.org Artificial Intelligence
Large Language Models (OpenAI et al., 2024; Team et al., 2025; DeepSeek-AI et al., 2025) based on the Transformer (Vaswani et al., 2023) architecture have achieved impressive results, approaching or exceeding human-level performance across multiple domains. Scaling laws (Hestness et al., 2017; Kaplan et al., 2020) are an established method for modeling the performance of these networks, enabling researchers to plan large-scale training runs based on curated sets of smaller experiments. Traditionally, these laws focus on predicting proxy metrics for model quality, such as pre-training log-perplexity. This has proven invaluable for optimizing training hyperparameters, like the optimal ratio of tokens to parameters. Another important direction in understanding the scaling of LLMs is tracking the behavior of more interpretable indicators of model capabilities, like accuracy on downstream benchmarks measuring the performance on general knowledge, reasoning, math and coding tasks. Despite early attempts to solve this problem (Grattafiori et al., 2024; Isik et al., 2025; Chen et al., 2025), scaling downstream metrics have been often referred to as noisy and unreliable (Schaeffer et al., 2025; Lourie et al., 2025). Current approaches to modeling the downstream performance performance of LLMs (Grattafiori et al., 2024; Chen et al., 2025; Bhagia et al., 2024) typically rely on a two-stage approach, where the training budget is first mapped to a proxy metric like mean log-probability of the correct answer, and then another dependence is established, mapping to benchmark accuracy. Work done as an intern at Apple.
arXiv.org Artificial Intelligence
Dec-10-2025
- Country:
- Africa > Zambia
- Southern Province > Choma (0.04)
- Asia
- India > Karnataka
- Bengaluru (0.04)
- Japan > Honshū
- Chūbu > Toyama Prefecture > Toyama (0.04)
- Middle East
- Israel (0.04)
- Jordan (0.04)
- Saudi Arabia > Asir Province
- Abha (0.04)
- Myanmar > Tanintharyi Region
- Dawei (0.04)
- India > Karnataka
- Europe
- Monaco (0.04)
- Slovenia > Drava
- Municipality of Benedikt > Benedikt (0.04)
- United Kingdom > Wales (0.04)
- North America > United States
- Colorado (0.04)
- Washington > King County
- Seattle (0.04)
- Africa > Zambia
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Education > Curriculum > Subject-Specific Education (1.00)
- Technology: