AITopics | gradient sparsification

Collaborating Authors

gradient sparsification

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Theoretical Framework for Energy-Aware Gradient Pruning in Federated Learning

Athanasakos, Emmanouil M.

arXiv.org Machine LearningMar-25-2026

Federated Learning (FL) is constrained by the communication and energy limitations of decentralized edge devices. While gradient sparsification via Top-K magnitude pruning effectively reduces the communication payload, it remains inherently energy-agnostic. It assumes all parameter updates incur identical downstream transmission and memory-update costs, ignoring hardware realities. We formalize the pruning process as an energy-constrained projection problem that accounts for the hardware-level disparities between memory-intensive and compute-efficient operations during the post-backpropagation phase. We propose Cost-Weighted Magnitude Pruning (CWMP), a selection rule that prioritizes parameter updates based on their magnitude relative to their physical cost. We demonstrate that CWMP is the optimal greedy solution to this constrained projection and provide a probabilistic analysis of its global energy efficiency. Numerical results on a non-IID CIFAR-10 benchmark show that CWMP consistently establishes a superior performance-energy Pareto frontier compared to the Top-K baseline.

artificial intelligence, machine learning, selection rule, (14 more...)

arXiv.org Machine Learning

2603.22465

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Greece > Attica > Athens (0.04)

Genre: Research Report (0.50)

Industry: Energy (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)

Add feedback

andLearning

Neural Information Processing SystemsFeb-9-2026, 11:38:07 GMT

Broadly speaking, compression eitherinvolvesquantization [33,50,27,26,28-31,15, 32]to reduce the precision of transmitted information, or biased sparsification [24,25,35,34,51, 52, 49, 53] to transmit only a few components of a vector with the largest magnitudes. TheDIANAtechnique was further generalized in [31]to account for avariety of compressors. For0 < η 1 L+β < 1 Li+β, i S, we have0 < 1 η(λi +β) < 1, and hence,D is asymmetric positive-definite matrix. In this section, we will compile some results that will proveto be useful later in our analysis. Wedosotosetupthebasic proof structure that we will later build on for analyzing more involved settings.

artificial intelligence, kxi, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.46)

Add feedback

7a6bda9ad6ffdac035c752743b7e9d0e-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 11:38:04 GMT

We consider a standard federated learning (FL) setup where a group of clients periodically coordinate with a central server to train a statistical model.

artificial intelligence, arxivpreprintarxiv, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States > Virginia (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Linear Convergence in Federated Learning: Tackling Client Heterogeneity and Sparse Gradients

Neural Information Processing SystemsDec-24-2025, 08:36:54 GMT

We consider a standard federated learning (FL) setup where a group of clients periodically coordinate with a central server to train a statistical model. We develop a general algorithmic framework called FedLin to tackle some of the key challenges intrinsic to FL, namely objective heterogeneity, systems heterogeneity, and infrequent and imprecise communication. Our framework is motivated by the observation that under these challenges, various existing FL algorithms suffer from a fundamental speed-accuracy conflict: they either guarantee linear convergence but to an incorrect point, or convergence to the global minimum but at a sub-linear rate, i.e., fast convergence comes at the expense of accuracy. In contrast, when the clients' local loss functions are smooth and strongly convex, we show that FedLin guarantees linear convergence to the global minimum, despite arbitrary objective and systems heterogeneity. We then establish matching upper and lower bounds on the convergence rate of FedLin that highlight the effects of infrequent, periodic communication. Finally, we show that FedLin preserves linear convergence rates under aggressive gradient sparsification, and quantify the effect of the compression level on the convergence rate. Notably, our work is the first to provide tight linear convergence rate guarantees, and constitutes the first comprehensive analysis of gradient sparsification in FL.

federated learning, linear convergence, tackling client heterogeneity, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.43)

Add feedback

Rethinking gradient sparsification as total error minimization

Neural Information Processing SystemsDec-24-2025, 01:12:47 GMT

Gradient compression is a widely-established remedy to tackle the communication bottleneck in distributed training of large deep neural networks (DNNs). Under the error-feedback framework, Top-$k$ sparsification, sometimes with $k$ as little as 0.1% of the gradient size, enables training to the same model quality as the uncompressed case for a similar iteration count. From the optimization perspective, we find that Top-$k$ is the communication-optimal sparsifier given a per-iteration $k$ element budget.We argue that to further the benefits of gradient sparsification, especially for DNNs, a different perspective is necessary -- one that moves from per-iteration optimality to consider optimality for the entire training.We identify that the total error -- the sum of the compression errors for all iterations -- encapsulates sparsification throughout training. Then, we propose a communication complexity model that minimizes the total error under a communication budget for the entire training. We find that the hard-threshold sparsifier, a variant of the Top-$k$ sparsifier with $k$ determined by a constant hard-threshold, is the optimal sparsifier for this model. Motivated by this, we provide convex and non-convex convergence analyses for the hard-threshold sparsifier with error-feedback. We show that hard-threshold has the same asymptotic convergence and linear speedup property as SGD in both the case, and unlike with Top-$k$ sparsifier, has no impact due to data-heterogeneity. Our diverse experiments on various DNNs and a logistic regression model demonstrate that the hard-threshold sparsifier is more communication-efficient than Top-$k$.

artificial intelligence, machine learning, sparsifier, (10 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.59)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.59)

Add feedback

Gradient Sparsification for Communication-Efficient Distributed Optimization

Neural Information Processing SystemsNov-20-2025, 22:01:41 GMT

Modern large-scale machine learning applications require stochastic optimization algorithms to be implemented on distributed computational architectures. A key bottleneck is the communication overhead for exchanging information such as stochastic gradients among different workers. In this paper, to reduce the communication cost, we propose a convex optimization formulation to minimize the coding length of stochastic gradients. The key idea is to randomly drop out coordinates of the stochastic gradient vectors and amplify the remaining coordinates appropriately to ensure the sparsified gradient to be unbiased. To solve the optimal sparsification efficiently, several simple and fast algorithms are proposed for an approximate solution, with a theoretical guarantee for sparseness.

communication-efficient, gradient sparsification, name change, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

A Related Work on Compression Techniques in Distributed Optimization and Learning

Neural Information Processing SystemsAug-15-2025, 08:30:38 GMT

The analysis in this section follows the techniques introduced in [8]. The proof of Proposition 2 follows roughly the same steps as the proof of Proposition 1. In this section, we will compile some results that will prove to be useful later in our analysis. With this in mind, we will assume throughout this section that all clients perform the same number of local updates, i.e., To that end, we will make use of the following lemma. To prove Theorem 5, we will construct an example involving two clients.

convex, equation, inequality, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

7a6bda9ad6ffdac035c752743b7e9d0e-Paper.pdf

Neural Information Processing SystemsAug-15-2025, 08:30:34 GMT

arxiv preprint arxiv, heterogeneity, system heterogeneity, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Virginia (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Regularized Top-$k$: A Bayesian Framework for Gradient Sparsification

Bereyhi, Ali, Liang, Ben, Boudreau, Gary, Afana, Ali

arXiv.org Artificial IntelligenceJan-9-2025

Error accumulation is effective for gradient sparsification in distributed settings: initially-unselected gradient entries are eventually selected as their accumulated error exceeds a certain level. The accumulation essentially behaves as a scaling of the learning rate for the selected entries. Although this property prevents the slow-down of lateral movements in distributed gradient descent, it can deteriorate convergence in some settings. This work proposes a novel sparsification scheme that controls the learning rate scaling of error accumulation. The development of this scheme follows two major steps: first, gradient sparsification is formulated as an inverse probability (inference) problem, and the Bayesian optimal sparsification mask is derived as a maximum-a-posteriori estimator. Using the prior distribution inherited from Top-$k$, we derive a new sparsification algorithm which can be interpreted as a regularized form of Top-$k$. We call this algorithm regularized Top-$k$ (RegTop-$k$). It utilizes past aggregated gradients to evaluate posterior statistics of the next aggregation. It then prioritizes the local accumulated gradient entries based on these posterior statistics. We validate our derivation through numerical experiments. In distributed linear regression, it is observed that while Top-$k$ remains at a fixed distance from the global optimum, RegTop-$k$ converges to the global optimum at significantly higher compression ratios. We further demonstrate the generalization of this observation by employing RegTop-$k$ in distributed training of ResNet-18 on CIFAR-10, where it noticeably outperforms Top-$k$.

bayesian inference, gradient sparsification, machine learning, (2 more...)

arXiv.org Artificial Intelligence

2501.05633

Genre: Research Report (0.69)

Technology: