74dbd1111727a31a2b825d615d80b2e7-Supplemental.pdf
–Neural Information Processing Systems
Recent empirical successes in large-scale machine learning have been powered by massive data parallelism and hardware acceleration, with batch sizes trending beyond 10K+ images [46] or 1M+ tokens [9]. Numerous interdisciplinarysources [5,12,24,33]indicate that the performance bottlenecks of contemporary deep learning pipelines can lie in many places other than gradient computation.
Neural Information Processing Systems
Feb-8-2026, 23:01:51 GMT