Prompt Engineering Large Language Models' Forecasting Capabilities

Schoenegger, Philipp, Jones, Cameron R., Tetlock, Philip E., Mellers, Barbara

Jun-3-2025–arXiv.org Artificial Intelligence

Forecasting future events has significant decision-relevance, as having a well-calibrated probabilistic estimation of the risk of a future pandemic, a conflict, or an emerging technology is crucial in making decisions under uncertainty. Current best practices for forecasting rely on aggregating the judgemental forecasts of experienced forecasters (Tetlock & Gardner 2016), a process that is both lengthy and expensive, though it promises to produce valuable input into decision-making processes (Mellers et al, 2019; Tetlock et al. 2014). Recent work has applied frontier large language models (LLM) to forecasting, testing a variety of research questions, such as whether LLMs are able to match human forecasting performance, what determines their prediction capabilities, and how these capabilities may be increased. For example, previous work looked at retrieval-augmented systems (Halawi et al. 2024), aggregation of multiple models (Schoenegger et al. 2024), ranking-based context retrieval systems (Yan et al. 2024), or applications of reinforcement learning (Turtel et al. 2025b). While many of these approaches have resulted in increased forecasting performance, the current performance of frontier models still trails experienced forecaster aggregates on ForecastBench (Karger et al. 2024). Many such approaches have focused on specific aspects in designing forecasting pipelines such as effective news aggregation (Wang et al. 2025) or fine-tuning on model self-play output (Turtel et al. 2025).

large language model, machine learning, probabilistic estimate, (18 more...)

arXiv.org Artificial Intelligence

Jun-3-2025

arXiv.org PDF

Add feedback

Country:
- Asia
  - China (0.04)
  - India (0.04)
- Europe
  - France > Auvergne-Rhône-Alpes
    - Puy-de-Dôme > Clermont-Ferrand (0.04)
  - Spain (0.04)
  - Switzerland (0.04)
  - Ukraine > Crimea (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
- North America > United States
  - California > San Diego County
    - San Diego (0.04)
  - New York (0.04)

Genre:
- Research Report
  - Experimental Study > Negative Result (0.48)
  - New Finding (1.00)

Industry:
- Banking & Finance (0.46)
- Government (0.46)
- Media (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Learning Graphical Models > Directed Networks
      - Bayesian Learning (0.46)
    - Neural Networks > Deep Learning (0.70)
  - Natural Language > Large Language Model (1.00)
  - Representation & Reasoning > Uncertainty
    - Bayesian Inference (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found