Goto

Collaborating Authors

 Gradient Descent





Learning with SGD and Random Features

Neural Information Processing Systems

Sketching and stochastic gradient methods are arguably the most common techniques to derive efficient large scale learning algorithms. In this paper, we investigate their application in the context of nonparametric statistical learning.


New Insight into Hybrid Stochastic Gradient Descent: Beyond With-Replacement Sampling and Convexity

Neural Information Processing Systems

As an incremental-gradient algorithm, the hybrid stochastic gradient descent (HS-GD) enjoys merits of both stochastic and full gradient methods for finite-sum problem optimization.


Connecting Optimization and Regularization Paths

Neural Information Processing Systems

Consequently, a line of work has focused on characterizing the implicit biases of global optimum reached by various optimization algorithms. For example, Gunasekar et al. [ 2017 ] consider the problem of matrix factorization and show that gradient descent (GD) on un-regularized objective converges to the minimum nuclear norm solution.





The Physical Systems Behind Optimization Algorithms

Neural Information Processing Systems

In particular, we study gradient descent, proximal gradient descent, coordinate gradient descent, proximal coordinate gradient, and Newton's methods as well as their Nesterov's accelerated variants in a unified framework motivated by a