compressor
- North America > Canada > Ontario > Toronto (0.14)
- Europe > Switzerland > Zürich > Zürich (0.04)
- Europe > Russia (0.04)
- (2 more...)
- Asia > China (0.04)
- North America > United States > North Carolina (0.04)
- Asia > Singapore (0.04)
- North America > United States > Pennsylvania (0.04)
- Europe > Russia (0.04)
- Asia > Russia (0.04)
- Asia > China > Beijing > Beijing (0.04)
SA-PEF: Step-Ahead Partial Error Feedback for Efficient Federated Learning
Redie, Dawit Kiros, Arablouei, Reza, Werner, Stefan
Biased gradient compression with error feedback (EF) reduces communication in federated learning (FL), but under non-IID data, the residual error can decay slowly, causing gradient mismatch and stalled progress in the early rounds. We propose step-ahead partial error feedback (SA-PEF), which integrates step-ahead (SA) correction with partial error feedback (PEF). SA-PEF recovers EF when the step-ahead coefficient α = 0 and step-ahead EF (SAEF) when α = 1. For non-convex objectives and δ-contractive compressors, we establish a second-moment bound and a residual recursion that guarantee convergence to stationar-ity under heterogeneous data and partial client participation. To balance SAEF's rapid warm-up with EF's long-term stability, we select α near its theory-predicted optimum. Experiments across diverse architectures and datasets show that SA-PEF consistently reaches target accuracy faster than EF. Modern large-scale machine learning increasingly relies on distributed computation, where both data and compute are spread across many devices. Federated learning (FL) enables model training in this setting without centralizing raw data, enhancing privacy and scalability under heterogeneous client distributions (McMahan et al., 2017; Kairouz et al., 2021). In each synchronous FL round, the server broadcasts the current global model to a subset of clients. These clients perform several steps of stochastic gradient descent (SGD) on their local data and return updates to the server, which aggregates them to form the next global iterate (Huang et al., 2022; Wang & Ji, 2022; Li et al., 2024). Although FL leverages rich distributed data, it faces two key challenges.
- Europe > Norway (0.14)
- Oceania > Australia (0.04)
- North America > United States > Virginia (0.04)
- Europe > Finland (0.04)
Collaborative Compressors in Distributed Mean Estimation with Limited Communication Budget
Vardhan, Harsh, Mazumdar, Arya
Distributed high dimensional mean estimation is a common aggregation routine used often in distributed optimization methods. Most of these applications call for a communication-constrained setting where vectors, whose mean is to be estimated, have to be compressed before sharing. One could independently encode and decode these to achieve compression, but that overlooks the fact that these vectors are often close to each other. To exploit these similarities, recently Suresh et al., 2022, Jhunjhunwala et al., 2021, Jiang et al, 2023, proposed multiple correlation-aware compression schemes. However, in most cases, the correlations have to be known for these schemes to work. Moreover, a theoretical analysis of graceful degradation of these correlation-aware compression schemes with increasing dissimilarity is limited to only the $\ell_2$-error in the literature. In this paper, we propose four different collaborative compression schemes that agnostically exploit the similarities among vectors in a distributed setting. Our schemes are all simple to implement and computationally efficient, while resulting in big savings in communication. The analysis of our proposed schemes show how the $\ell_2$, $\ell_\infty$ and cosine estimation error varies with the degree of similarity among vectors.
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > United States > Virginia (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Error Compensated Distributed SGD Can Be Accelerated
Gradient compression is a recent and increasingly popular technique for reducing the communication cost in distributed training of large-scale machine learning models. In this work we focus on developing efficient distributed methods that can work for any compressor satisfying a certain contraction property, which includes both unbiased (after appropriate scaling) and biased compressors such as RandK and TopK. Applied naively, gradient compression introduces errors that either slow down convergence or lead to divergence. A popular technique designed to tackle this issue is error compensation/error feedback. Due to the difficulties associated with analyzing biased compressors, it is not known whether gradient compression with error compensation can be combined with acceleration. In this work, we show for the first time that error compensated gradient compression methods can be accelerated. In particular, we propose and study the error compensated loopless Katyusha method, and establish an accelerated linear convergence rate under standard assumptions. We show through numerical experiments that the proposed method converges with substantially fewer communication rounds than previous error compensated algorithms.
Lossy Compression for Lossless Prediction
Most data is automatically collected and only ever seen by algorithms. Yet, data compressors preserve perceptual fidelity rather than just the information needed by algorithms performing downstream tasks. In this paper, we characterize the bit-rate required to ensure high performance on all predictive tasks that are invariant under a set of transformations, such as data augmentations. Based on our theory, we design unsupervised objectives for training neural compressors. Using these objectives, we train a generic image compressor that achieves substantial rate savings (more than 1000x on ImageNet) compared to JPEG on 8 datasets, without decreasing downstream classification performance.