Beyond Accuracy: Are Time Series Foundation Models Well-Calibrated?

Adler, Coen, Chang, Yuxin, Draxler, Felix, Abdi, Samar, Smyth, Padhraic

Oct-21-2025–arXiv.org Machine Learning

The recent development of foundation models for time series data has generated considerable interest in using such models across a variety of applications. Although foundation models achieve state-of-the-art predictive performance, their calibration properties remain relatively underexplored, despite the fact that calibration can be critical for many practical applications. In this paper, we investigate the calibration-related properties of five recent time series foundation models and two competitive baselines. We perform a series of systematic evaluations assessing model calibration (i.e., over- or under-confidence), effects of varying prediction heads, and calibration under long-term autoregressive forecasting. We find that time series foundation models are consistently better calibrated than baseline models and tend not to be either systematically over- or under-confident, in contrast to the overconfidence often seen in other deep learning models.

data mining, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

Oct-21-2025

arXiv.org PDF

Add feedback

Country:
- North America
  - Trinidad and Tobago > Trinidad
    - Arima > Arima (0.06)
  - United States > California
    - Orange County > Irvine (0.04)
- Oceania > Australia (0.04)

Genre:
- Research Report (1.00)

Industry:
- Health & Medicine > Therapeutic Area (1.00)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning
      - Neural Networks > Deep Learning (1.00)
      - Statistical Learning (0.93)
    - Natural Language (1.00)
  - Data Science > Data Mining (1.00)