AITopics | scalecom

9d58963592071dbf38a0fa114269959c-Supplemental.pdf

Neural Information Processing SystemsFeb-9-2026, 13:17:19 GMT

batch size, experiment, scalecom, (15 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training

Neural Information Processing SystemsFeb-9-2026, 13:17:04 GMT

artificial intelligence, machine learning, scalecom, (14 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

The proposed LP filter is fundamentally different from previous weighted

Neural Information Processing SystemsFeb-9-2026, 13:16:53 GMT

Due to space constraints we only address major concerns; all suggestions will be included in the final version. Experimentally we've observed that when using previous weighted We will compare and cite related work (gTop-k) in the final draft. In sec.3 we assume min. SGD has a small critical batch size to approximate a full gradient descent iteration, no matter the size of dataset. Appendix-F shows ScaleCom's scalability in system performance; more Analogously, we perform filtering on the residual gradients (see eq.(5)) Connection will be discussed in the revised version.

artificial intelligence, lp filter, machine learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.50)

Add feedback

ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training

Neural Information Processing SystemsDec-24-2025, 08:57:56 GMT

Large-scale distributed training of Deep Neural Networks (DNNs) on state-of-the-art platforms are expected to be severely communication constrained. To overcome this limitation, numerous gradient compression techniques have been proposed and have demonstrated high compression ratios. However, most existing compression methods do not scale well to large scale distributed systems (due to gradient build-up) and / or lack evaluations in large datasets. To mitigate these issues, we propose a new compression technique, Scalable Sparsified Gradient Compression (ScaleComp), that (i) leverages similarity in the gradient distribution amongst learners to provide a commutative compressor and keep communication cost constant to worker number and (ii) includes low-pass filter in local gradient accumulations to mitigate the impacts of large batch size training and significantly improve scalability. Using theoretical analysis, we show that ScaleComp provides favorable convergence guarantees and is compatible with gradient all-reduce techniques. Furthermore, we experimentally demonstrate that ScaleComp has small overheads, directly reduces gradient traffic and provides high compression rates (70-150X) and excellent scalability (up to 64-80 learners and 10X larger batch sizes over normal training) across a wide range of applications (image, language, and speech) without significant accuracy loss.

communication-efficient, name change, scalable sparsified gradient compression, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.59)

Add feedback

A Observations in Local Memory Similarity

Neural Information Processing SystemsAug-15-2025, 09:49:46 GMT

We observed local memory's similarity through Q-Q (quantile-quantile) plots as shown in Figure In Figure A1(a), the linearity of the points in Q-Q plot suggests that the worker 1's local This is consistent to our observations in pairwise cosine distance shown in Figure 2(a). This indicates that we can possibly use local worker's top-k One variant of Y oung's inequality is k x + y k A.1 global minimum of f ( x) 2, The quadrilateral identity is h x, y i = 1 2 k x k We provided the following table to explain section 3's main results and connected them to other parts of paper. Our theorem 1 shows this; indicates its applicability in distributed training. Lemma1: contraction property Lemma2: contraction in distributed setting Theorem1: ScaleCom's convergence rate same as SGD ( 1 / p T) Intuition Higher correlation between workers brings CL T - k closer to true top-k Require positive correlation between workers in distr. Fig.2 and 3 show high correlation so our contraction is close to true top-k Fig.2 and 3 show positive correlation between workers Table 1,2 (Fig4,5) verified ScaleCom's convergence same as baseline Each node is equipped with 2 IBM Power 9 processors clocked at 3.15 GHz.

batch size, experiment, scalecom, (15 more...)

Neural Information Processing Systems

Industry: Information Technology (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training Chia-Y u Chen

Neural Information Processing SystemsAug-15-2025, 09:49:33 GMT

Large-scale distributed training of Deep Neural Networks (DNNs) on state-of-the-art platforms is expected to be severely communication constrained. To overcome this limitation, numerous gradient compression techniques have been proposed and have demonstrated high compression ratios. However, most existing methods do not scale well to large scale distributed systems (due to gradient build-up) and/or fail to evaluate model fidelity (test accuracy) on large datasets.

compression, gradient, scalecom, (12 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
North America > Canada (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The proposed LP filter is fundamentally different from previous weighted

Neural Information Processing SystemsAug-15-2025, 09:49:20 GMT

Due to space constraints we only address major concerns; all suggestions will be included in the final version. Experimentally we've observed that when using previous weighted We will compare and cite related work (gTop-k) in the final draft. In sec.3 we assume min. SGD has a small critical batch size to approximate a full gradient descent iteration, no matter the size of dataset. Appendix-F shows ScaleCom's scalability in system performance; more Analogously, we perform filtering on the residual gradients (see eq.(5)) Connection will be discussed in the revised version.

gradient, lp filter, scalecom, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.50)

Add feedback

ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training

Neural Information Processing SystemsOct-10-2024, 22:00:09 GMT

Large-scale distributed training of Deep Neural Networks (DNNs) on state-of-the-art platforms are expected to be severely communication constrained. To overcome this limitation, numerous gradient compression techniques have been proposed and have demonstrated high compression ratios. However, most existing compression methods do not scale well to large scale distributed systems (due to gradient build-up) and / or lack evaluations in large datasets. To mitigate these issues, we propose a new compression technique, Scalable Sparsified Gradient Compression (ScaleComp), that (i) leverages similarity in the gradient distribution amongst learners to provide a commutative compressor and keep communication cost constant to worker number and (ii) includes low-pass filter in local gradient accumulations to mitigate the impacts of large batch size training and significantly improve scalability. Using theoretical analysis, we show that ScaleComp provides favorable convergence guarantees and is compatible with gradient all-reduce techniques.

communication-efficient, scalable sparsified gradient compression, scalecom, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.62)

Add feedback

Filters

Collaborating Authors

scalecom

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

9d58963592071dbf38a0fa114269959c-Supplemental.pdf

ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training

The proposed LP filter is fundamentally different from previous weighted

ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training

A Observations in Local Memory Similarity

ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training Chia-Y u Chen

The proposed LP filter is fundamentally different from previous weighted

ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training