LAION-C: An Out-of-Distribution Benchmark for Web-Scale Vision Models
Li, Fanfei, Klein, Thomas, Brendel, Wieland, Geirhos, Robert, Zimmermann, Roland S.
–arXiv.org Artificial Intelligence
Out-of-distribution (OOD) robustness is a desired property of computer vision models. Improving model robustness requires high-quality signals from robustness benchmarks to quantify progress. While various benchmark datasets such as ImageNet-C were proposed in the ImageNet era, most ImageNet-C corruption types are no longer OOD relative to today's large, web-scraped datasets, which already contain common corruptions such as blur or JPEG compression artifacts. Consequently, these benchmarks are no longer well-suited for evaluating OOD robustness in the era of web-scale datasets. Indeed, recent models show saturating scores on ImageNet-era OOD benchmarks, indicating that it is unclear whether models trained on web-scale datasets truly become better at OOD generalization or whether they have simply been exposed to the test distortions during training. To address this, we introduce LAION-C as a benchmark alternative for ImageNet-C. LAION-C consists of six novel distortion types specifically designed to be OOD, even for web-scale datasets such as LAION. In a comprehensive evaluation of state-of-the-art models, we find that the LAION-C dataset poses significant challenges to contemporary models, including MLLMs such as Gemini and GPT-4o. We additionally conducted a psychophysical experiment to evaluate the difficulty of our corruptions for human observers, enabling a comparison of models to lab-quality human robustness data. We observe a paradigm shift in OOD generalization: from humans outperforming models, to the best models now matching or outperforming the best human observers.
arXiv.org Artificial Intelligence
Jun-23-2025
- Country:
- North America > United States (0.92)
- Genre:
- Research Report
- New Finding (0.67)
- Experimental Study (0.46)
- Research Report
- Industry:
- Law (1.00)
- Government (1.00)
- Information Technology (0.67)
- Health & Medicine > Therapeutic Area (0.46)
- Leisure & Entertainment > Sports (0.46)
- Technology: