Goto

Collaborating Authors

 Gradient Descent





Optimal Epoch Stochastic Gradient Descent Ascent Methods for Min-Max Optimization

Neural Information Processing Systems

Although some recent studies have proposed stochastic algorithms with fast convergence rates for min-max problems, they require additional assumptions about the problem, e.g.,


Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

"NIPS Neural Information Processing Systems 8-11th December 2014, Montreal, Canada",,, "Paper ID:","1527" "Title:","Delay-Tolerant Algorithms for Asynchronous Distributed Online Learning" Current Reviews First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper considers asynchronous parallel updates in stochastic gradient descent with delays. This is a very important problem in large-scale distributed data processing. The objective of the problem studied in this paper is to achieve regret bounds similar to the ones obtained by adaptive gradient (i.e. This boils down to keeping track of updates to gradient coordinates.


A Detailed comparisons with related work

Neural Information Processing Systems

In Table 1, we compare our agnostic learning results. Our results in this setting come from Theorem 3.3. We note that the sample complexity for Diakonikolas et al. To prove Lemma 3.5, we use the following result of Y ehudai and Shamir [35]. We first consider the case when ฯƒ satisfies Assumption 3.1.



Understanding the Role of Momentum in Stochastic Gradient Methods

Neural Information Processing Systems

The use of momentum in stochastic gradient methods has become a widespread practice in machine learning. Different variants of momentum, including heavy-ball momentum, Nesterov's accelerated gradient (NAG), and quasi-hyperbolic momentum (QHM), have demonstrated success on various tasks.



Federated Accelerated Stochastic Gradient Descent

Neural Information Processing Systems

Leveraging distributed computing resources and decentralized data is crucial, if not necessary, for large-scale machine learning applications.