No " Zero-Shot " Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
–Neural Information Processing Systems
Web-crawled datasets underlie the impressive "zero-shot" performance of multimodal models, such as CLIP for classification and Stable-Diffusion for image generation. However, it is unclear how meaningful the notion of "zero-shot" generalization is for such models because the extent to which their pretraining datasets encompass downstream concepts used in "zero-shot" evaluation is unknown. In this work, we ask: How is the performance of multimodal models on downstream concepts influenced by the frequency of these concepts in their pretraining datasets? We comprehensively investigate this question across 34 models and 5 standard pretraining datasets, generating over 300GB of data artifacts. We consistently find that, far from exhibiting "zero-shot" generalization, multimodal models require exponentially more data to achieve linear improvements in downstream "zero-shot" performance, following a sample inefficient log-linear scaling trend. This trend persists even when controlling for sample-level similarity between pretraining and evaluation datasets [81], and testing on purely synthetic data distributions [52]. Furthermore, upon benchmarking models on long-tailed data sampled based on our analysis, we demonstrate that multimodal models across the board perform poorly. We contribute this long-tail test dataset as the Let it Wag! benchmark to further research in this direction. Taken together, our study reveals an exponential need for training data which implies that the key to "zero-shot" generalization capabilities under large-scale training data and compute paradigms remains to be found.
Neural Information Processing Systems
Mar-22-2025, 11:46:36 GMT
- Country:
- Europe (0.67)
- North America > United States
- California (0.27)
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (1.00)
- Research Report
- Industry:
- Health & Medicine (0.67)
- Information Technology > Services (0.46)
- Law (0.67)
- Transportation
- Technology: