ODP-Bench: Benchmarking Out-of-Distribution Performance Prediction

Yu, Han, Li, Kehan, Li, Dongbai, He, Yue, Zhang, Xingxuan, Cui, Peng

Nov-3-2025–arXiv.org Artificial Intelligence

Recently, there has been gradually more attention paid to Out-of-Distribution (OOD) performance prediction, whose goal is to predict the performance of trained models on unlabeled OOD test datasets, so that we could better leverage and deploy off-the-shelf trained models in risk-sensitive scenarios. Although progress has been made in this area, evaluation protocols in previous literature are inconsistent, and most works cover only a limited number of real-world OOD datasets and types of distribution shifts. T o provide convenient and fair comparisons for various algorithms, we propose Out-of-Distribution Performance Prediction Benchmark (ODP-Bench), a comprehensive benchmark that includes most commonly used OOD datasets and existing practical performance prediction algorithms. W e provide our trained models as a testbench for future researchers, thus guaranteeing the consistency of comparison and avoiding the burden of repeating the model training process. Furthermore, we also conduct in-depth experimental analyses to better understand their capability boundary.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

Nov-3-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.83)

Industry:
- Health & Medicine (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Natural Language (0.93)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found