Goto

Collaborating Authors

 Gradient Descent







Gradient Sparsification for Communication-Efficient Distributed Optimization

Neural Information Processing Systems

In the synchronous stochastic gradient method, each worker processes a random minibatch of its training data, and then the local updates are synchronized by making an All-Reduce step, which aggregates stochastic gradients from all workers, and taking a Broadcast step that transmits the updated parameter vector back to all workers.





The Limit Points of (Optimistic) Gradient Descent in Min-Max Optimization

Neural Information Processing Systems

When they converge, do they converge to local min-max solutions? We characterize the limit points of two basic first order methods, namely Gradient Descent/Ascent (GDA) and Optimistic Gradient Descent Ascent (OGDA).