SAGE: Streaming Agreement-Driven Gradient Sketches for Representative Subset Selection

Oct-10-2025–arXiv.org Artificial Intelligence

Training modern neural networks on large datasets is computationally and energy intensive. We present SAGE, a streaming data-subset selection method that maintains a compact Frequent Directions (FD) sketch of gradient geometry in $O(\ell D)$ memory and prioritizes examples whose sketched gradients align with a consensus direction. The approach eliminates $N \times N$ pairwise similarities and explicit $N \times \ell$ gradient stores, yielding a simple two-pass, GPU-friendly pipeline. Leveraging FD's deterministic approximation guarantees, we analyze how agreement scoring preserves gradient energy within the principal sketched subspace. Across multiple benchmarks, SAGE trains with small kept-rate budgets while retaining competitive accuracy relative to full-data training and recent subset-selection baselines, and reduces end-to-end compute and peak memory. Overall, SAGE offers a practical, constant-memory alternative that complements pruning and model compression for efficient training.

artificial intelligence, machine learning, selection, (13 more...)

arXiv.org Artificial Intelligence

Oct-10-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States > Wisconsin (0.14)

Genre:
- Research Report (0.83)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.91)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found