Escaping Saddle Points with Compressed SGD

Feb-8-2026, 17:46:32 GMT–Neural Information Processing Systems

Stochastic Gradient Descent (SGD) and its variants are the main workhorses of modern machine learning. Distributed implementations of SGD on a cluster of machines with a central server and a large number of workers are frequently used in practice due to the massive size of the data. In distributed SGD each machine holds a copy of the model and the computation proceeds in rounds. In every round, each worker finds a stochastic gradient based on its batch of examples, the server averages these stochastic gradients to obtain the gradient of the entire batch, makes an SGD step, and broadcasts the updated model parameters to the workers. With a large number of workers, computation parallelizes efficiently while communication becomes the main bottleneck [Chilimbi et al., 2014, Strom, 2015], since each worker needsto send its gradients to the server and receive the updatedmodel parameters. Commonsolutions for this probleminclude: local SGDand its variants, when each machine performs multiple local steps before communication [Stich, 2018]; decentralized architectureswhich allow pairwisecommunicationbetween the workers [McMahanet al., 2017] and gradient compression, when a compressed version of the gradient is communicated instead of the full gradient [Bernstein et al., 2018, Stich et al., 2018, Karimireddy et al., 2019]. In this work, we consider the latter approach, which we refer to as compressed SGD. Most machine learning models can be described by a d-dimensional vector of parameters x and themodel quality canbe estimatedas a function f(x).

artificial intelligence, iteration, machine learning, (17 more...)

Neural Information Processing Systems

Feb-8-2026, 17:46:32 GMT

Conferences PDF

Add feedback

Country:
- North America > United States
  - New York > New York County > New York City (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning
    - Mathematical & Statistical Methods (0.75)
    - Optimization (0.67)
  - Machine Learning > Statistical Learning
    - Gradient Descent (0.96)

Duplicate Docs Excel Report

Title
54eea69746513c0b90bbe6227b6f46c3-Supplemental.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found