AITopics | lossy activation compression

Supplemental Material for AC-GC: Lossy Activation Compression with Guaranteed Convergence

Neural Information Processing SystemsFeb-11-2026, 16:11:02 GMT

The appendices of this supplemental material are focused on providing detailed proofs (Appendix A), per-layer derivations for activation errors (Appendix B), algorithm and implementationdetails(AppendixC),datasetsandhyperparameters(AppendixD),extended experimental data (Appendix E) and additional experiments (Appendix F) to accompany the main paper. A code example and trained models are available for CIFAR10/ResNet50 by accessing https://github.com/rdevans0/acgc. L and η depend on the model being trained and dataset, and are thus problem-dependent constants. Preliminary on Separation of Norms Given two, independent random vectorsA= (an) RN and B =(bn) RN, whereE[bn]=0 n. Given f which obeys (4), and a convex functionD( X) which bounds the gradient error from above for all X, θ, and X; provided that D( X) e2V2 the variance of the compressed gradients satisfies E[kˆ θf(θ,Xnt)k2] (1+e2)V2 (16) Proof.

artificial intelligence, deep learning, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States (0.05)
North America > Canada > British Columbia (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

AC-GC: Lossy Activation Compression with Guaranteed Convergence

Neural Information Processing SystemsDec-25-2025, 03:11:54 GMT

Parallel hardware devices (e.g., graphics processor units) have limited high-bandwidth memory capacity.This negatively impacts the training of deep neural networks (DNNs) by increasing runtime and/or decreasing accuracy when reducing model and/or batch size to fit this capacity. Lossy compression is a promising approach to tackling memory capacity constraints, but prior approaches rely on hyperparameter search to achieve a suitable trade-off between convergence and compression, negating runtime benefits. In this paper we build upon recent developments on Stochastic Gradient Descent convergence to prove an upper bound on the expected loss increase when training with compressed activation storage. We then express activation compression error in terms of this bound, allowing the compression rate to adapt to training conditions automatically. The advantage of our approach, called AC-GC, over existing lossy compression frameworks is that, given a preset allowable increase in loss, significant compression without significant increase in error can be achieved with a single training run. When combined with error-bounded methods, AC-GC achieves 15.1x compression with an average accuracy change of 0.1% on text and image datasets. AC-GC functions on any model composed of the layers analyzed and, by avoiding compression rate search, reduces overall training time by 4.6x over SuccessiveHalving.

ac-gc, lossy activation compression, name change, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.60)

Add feedback

AC-GC: Lossy Activation Compression with Guaranteed Convergence

Neural Information Processing SystemsJan-19-2025, 11:18:55 GMT

Parallel hardware devices (e.g., graphics processor units) have limited high-bandwidth memory capacity.This negatively impacts the training of deep neural networks (DNNs) by increasing runtime and/or decreasing accuracy when reducing model and/or batch size to fit this capacity. Lossy compression is a promising approach to tackling memory capacity constraints, but prior approaches rely on hyperparameter search to achieve a suitable trade-off between convergence and compression, negating runtime benefits. In this paper we build upon recent developments on Stochastic Gradient Descent convergence to prove an upper bound on the expected loss increase when training with compressed activation storage. We then express activation compression error in terms of this bound, allowing the compression rate to adapt to training conditions automatically. The advantage of our approach, called AC-GC, over existing lossy compression frameworks is that, given a preset allowable increase in loss, significant compression without significant increase in error can be achieved with a single training run.

ac-gc, convergence, lossy activation compression

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.62)

Add feedback

Filters

Collaborating Authors

lossy activation compression

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Supplemental Material for AC-GC: Lossy Activation Compression with Guaranteed Convergence

AC-GC: Lossy Activation Compression with Guaranteed Convergence

AC-GC: Lossy Activation Compression with Guaranteed Convergence