SELECT: A Large-Scale Benchmark of Data Curation Strategies for Image Classification
–Neural Information Processing Systems
Our findings show interesting trends, particularly pertaining to recent methods for data curation such as synthetic data generation and lookup based on CLIP embeddings. We show that although these strategies are highly competitive for certain tasks, the curation strategy used to assemble the original ImageNet-1K dataset remains the gold standard. We anticipate that our benchmark can illuminate the path for new methods to further reduce the gap.
Neural Information Processing Systems
Feb-18-2026, 18:02:08 GMT
- Country:
- North America > United States (0.28)
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Government > Regional Government (0.46)
- Technology:
- Information Technology
- Artificial Intelligence
- Machine Learning
- Neural Networks > Deep Learning (0.46)
- Performance Analysis > Accuracy (0.67)
- Natural Language (1.00)
- Representation & Reasoning (1.00)
- Vision (1.00)
- Machine Learning
- Data Science > Data Quality
- Data Cleaning (0.63)
- Sensing and Signal Processing > Image Processing (1.00)
- Artificial Intelligence
- Information Technology