Non-Uniform Class-Wise Coreset Selection for Vision Model Fine-tuning

Zhang, Hanyu, Xing, Zhen, He, Ruian, Yang, Wenxuan, Ma, Chenxi, Tan, Weimin, Yan, Bo

Nov-19-2025–arXiv.org Artificial Intelligence

Coreset selection aims to identify a small yet highly informative subset of data, thereby enabling more efficient model training while reducing storage overhead. Recently, this capability has been leveraged to tackle the challenges of fine-tuning large foundation models, offering a direct pathway to their efficient and practical deployment. However, most existing methods are class-agnostic, causing them to overlook significant difficulty variations among classes. This leads them to disproportionately prune samples from either overly easy or hard classes, resulting in a suboptimal allocation of the data budget that ultimately degrades the final coreset performance. T o address this limitation, we propose Non-Uniform Class-Wise Coreset Selection (NUCS), a novel framework that both integrates class-level and sample-level difficulty. W e propose a robust metric for global class difficulty, quantified as the winsorized average of per-sample difficulty scores. Guided by this metric, our method performs a theoretically-grounded, nonuniform allocation of data selection budgets inter-class, while adaptively selecting samples intra-class with optimal difficulty ranges. Extensive experiments on a wide range of visual classification tasks demonstrate that NUCS consistently outperforms state-of-the-art methods across 10 diverse datasets and pre-trained models, achieving both superior accuracy and computational efficiency, highlighting the promise of non-uniform class-wise selection strategy for advancing the efficient fine-tuning of large foundation models.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

Nov-19-2025

arXiv.org PDF

Add feedback

Country:
- Asia > China
  - Shanghai > Shanghai (0.40)
- North America > United States
  - Florida > Miami-Dade County > Miami (0.04)

Genre:
- Research Report > New Finding (0.68)

Industry:
- Health & Medicine (0.69)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks (0.46)
    - Statistical Learning (0.68)
  - Natural Language (1.00)
  - Vision (1.00)