Layer-Aware Influence for Online Data Valuation Estimation
Yang, Ziao, Huang, Longbo, Liu, Hongfu
–arXiv.org Artificial Intelligence
Data-centric learning emphasizes curating high-quality training samples to boost performance rather than designing new architectures. A central problem is to estimate the influence of training sample efficiently. Prior studies largely focus on static influence measured on a converged model, overlooking how data valuation dynamically changes during optimization. This omission neglects the dynamic nature of sample influence during optimization, especially in deep models. To address the computational burden of frequent influence estimation, we develop a layer-aware online estimator that requires only loss-to-output gradients. This design avoids parameter-level and full-network gradients while preserving ranking fidelity. Extensive experiments across LLM pretraining, fine-tuning, and image classification show our method improves accuracy with substantially lower time and memory cost, making dynamic data curation efficient and scalable in practice.
arXiv.org Artificial Intelligence
Oct-21-2025
- Country:
- Asia
- China > Beijing
- Beijing (0.04)
- Middle East > Jordan (0.04)
- China > Beijing
- Europe
- France (0.04)
- Slovenia > Drava
- Municipality of Benedikt > Benedikt (0.04)
- North America
- Canada > Ontario
- Toronto (0.14)
- United States > Massachusetts
- Middlesex County > Waltham (0.04)
- Canada > Ontario
- Asia
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Information Technology (0.67)
- Technology: