Prompt Engineering Large Language Models' Forecasting Capabilities

Schoenegger, Philipp, Jones, Cameron R., Tetlock, Philip E., Mellers, Barbara

arXiv.org Artificial Intelligence 

Forecasting future events has significant decision-relevance, as having a well-calibrated probabilistic estimation of the risk of a future pandemic, a conflict, or an emerging technology is crucial in making decisions under uncertainty. Current best practices for forecasting rely on aggregating the judgemental forecasts of experienced forecasters (Tetlock & Gardner 2016), a process that is both lengthy and expensive, though it promises to produce valuable input into decision-making processes (Mellers et al, 2019; Tetlock et al. 2014). Recent work has applied frontier large language models (LLM) to forecasting, testing a variety of research questions, such as whether LLMs are able to match human forecasting performance, what determines their prediction capabilities, and how these capabilities may be increased. For example, previous work looked at retrieval-augmented systems (Halawi et al. 2024), aggregation of multiple models (Schoenegger et al. 2024), ranking-based context retrieval systems (Yan et al. 2024), or applications of reinforcement learning (Turtel et al. 2025b). While many of these approaches have resulted in increased forecasting performance, the current performance of frontier models still trails experienced forecaster aggregates on ForecastBench (Karger et al. 2024). Many such approaches have focused on specific aspects in designing forecasting pipelines such as effective news aggregation (Wang et al. 2025) or fine-tuning on model self-play output (Turtel et al. 2025).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found