Evaluating LLMs on Real-World Forecasting Against Expert Forecasters
–arXiv.org Artificial Intelligence
Large language models (LLMs) have demonstrated remarkable capabilities across diverse tasks, but their ability to forecast future events remains understudied. A year ago, large language models struggle to come close to the accuracy of a human crowd. I evaluate state-of-the-art LLMs on 464 forecasting questions from Metaculus, comparing their performance against top forecasters. Frontier models achieve Brier scores that ostensibly surpass the human crowd but still significantly underperform a group of experts.
arXiv.org Artificial Intelligence
Aug-6-2025
- Country:
- North America > United States (1.00)
- Genre:
- Research Report (0.51)
- Industry:
- Health & Medicine (1.00)
- Banking & Finance (0.93)
- Food & Agriculture (0.69)
- Law (0.67)
- Government > Regional Government
- Technology: