Scaling Laws for Robust Comparison of Open Foundation Language-Vision Models and Datasets
–Neural Information Processing Systems
In studies of transferable learning, scaling laws are obtained for various important foundation models to predict their properties and performance at larger scales. Taking language-vision learning as example, we show here how scaling law derivation can also be used for model and dataset comparison, allowing to decide which procedure is to be preferred for pre-training. Full scaling laws based on dense measurements across a wide span of model and samples seen scales are derived for two important language-vision learning procedures, CLIP and MaMMUT, that use either contrastive only or contrastive and captioning text generative loss. For the first time, we use derived scaling laws to compare both models and three open datasets, DataComp-1.4B,
Neural Information Processing Systems
Jun-22-2026, 23:48:56 GMT
- Country:
- Europe (0.92)
- Genre:
- Research Report
- New Finding (1.00)
- Experimental Study (1.00)
- Research Report
- Industry:
- Government (0.45)
- Technology:
- Information Technology > Artificial Intelligence
- Vision (1.00)
- Machine Learning > Neural Networks (0.67)
- Natural Language > Large Language Model (0.48)
- Information Technology > Artificial Intelligence