FAST: Topology-Aware Frequency-Domain Distribution Matching for Coreset Selection
Cui, Jin, Zhao, Boran, Xu, Jiajun, Guo, Jiaqi, Guan, Shuo, Ren, Pengju
Existing methods are either: (i) DNN-based, which are inherently coupled with network-specific parameters, inevitably introducing architectural bias and compromising generalization; or (ii) DNN-free, which utilize heuristics that lack rigorous theoretical guarantees for stability and accuracy. Neither approach explicitly constrains distributional equivalence of the representative subsets, largely because continuous distribution matching is broadly considered inapplicable to discrete dataset sampling. Furthermore, prevalent distribution metrics (e.g., MSE, KL, MMD, and CE) are often incapable of accurately capturing higher-order moments differences. These deficiencies lead to suboptimal coreset performance, preventing the selected coreset from being truly equivalent to the original dataset. W e propose F AST (Frequency-domain Aligned Sampling via T opology), the first DNN-free distribution-matching coreset selection framework that formulates coreset selection task as a graph-constrained optimization problem grounded in spectral graph theory and employs the Characteristic Function Distance (CFD) to capture full distributional information (i.e., all moments and intrinsic correlations) in the frequency domain. W e further discover that naive CFD suffers from a "vanishing phase gradient" issue in medium and high-frequency regions; to address this, we introduce an Attenuated Phase-Decoupled CFD.
Nov-26-2025
- Country:
- Africa > Rwanda
- Asia > China
- Shaanxi Province > Xi'an (0.04)
- Europe
- North America
- Canada
- Alberta > Census Division No. 15
- Improvement District No. 9 > Banff (0.04)
- British Columbia > Vancouver (0.04)
- Quebec > Montreal (0.04)
- Alberta > Census Division No. 15
- Mexico > Mexico City
- Mexico City (0.04)
- United States
- California (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Nevada (0.04)
- Ohio > Franklin County
- Columbus (0.04)
- Tennessee > Davidson County
- Nashville (0.04)
- Washington > King County
- Seattle (0.04)
- Canada
- Oceania > Australia
- New South Wales > Sydney (0.04)
- Genre:
- Research Report (0.64)
- Industry:
- Health & Medicine > Diagnostic Medicine (0.46)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning
- Neural Networks (1.00)
- Statistical Learning (0.68)
- Natural Language (1.00)
- Representation & Reasoning > Optimization (1.00)
- Vision (1.00)
- Machine Learning
- Information Technology > Artificial Intelligence