Goto

Collaborating Authors

 Gradient Descent


Using Statistics to Automate Stochastic Optimization

Neural Information Processing Systems

Rather than changing the learning rate at each iteration, we propose an approach that automates the most common hand-tuning heuristic: use a constant learning rate until "progress stops", then drop. We design an explicit statistical test that determines when the dynamics of stochastic gradient descent reach a stationary distribution.





Stochastic Gradient Hamiltonian Monte Carlo Methods with Recursive Variance Reduction

Neural Information Processing Systems

We provide a convergence analysis of SRVR-HMC for sampling from a class of non-log-concave distributions and show that SRVR-HMC converges faster than all existing HMC-type algorithms based on underdamped Langevin dynamics.



RSN: Randomized Subspace Newton

Neural Information Processing Systems

We develop a randomized Newton method capable of solving learning problems with huge dimensional feature spaces, which is a common setting in applications such as medical imaging, genomics and seismology. Our method leverages randomized sketching in a new way, by finding the Newton direction constrained to the space spanned by a random sketch. We develop a simple global linear convergence theory that holds for practically all sketching techniques, which gives the practitioners the freedom to design custom sketching approaches suitable for particular applications. We perform numerical experiments which demonstrate the efficiency of our method as compared to accelerated gradient descent and the full Newton method. Our method can be seen as a refinement and randomized extension of the results of Karimireddy, Stich, and Jaggi [18].


the liberty to group and reword some of the reviewers comment (in blue italic) to save space. 3 General answer on the usefulness of gradient descent, its theoretical guarantees, and its scalability

Neural Information Processing Systems

We thank the reviewers for the time they spent evaluating our manuscript and for their valuable comments. We agree that having theoretical guarantees would be a big plus. As for scalability, the bottleneck of our method is the single-linkage algorithm. Similarly to Monath et al. (NeurIPS 2017), our idea consists Given the significant body of additional material, we feel that this topic is best left to a future publication. Line 8,56,70,93: I would suggest a more cautious usage of the word "equivalent".


Explainable Learning Rate Regimes for Stochastic Optimization

arXiv.org Artificial Intelligence

Modern machine learning is trained by stochastic gradient descent (SGD), whose performance critically depends on how the learning rate (LR) is adjusted and decreased over time. Yet existing LR regimes may be intricate, or need to tune one or more additional hyper-parameters manually whose bottlenecks include huge computational expenditure, time and power in practice. This work, in a natural and direct manner, clarifies how LR should be updated automatically only according to the intrinsic variation of stochastic gradients. An explainable LR regime by leveraging stochastic second-order algorithms is developed, behaving a similar pattern to heuristic algorithms but implemented simply without any parameter tuning requirement, where it is of an automatic procedure that LR should increase (decrease) as the norm of stochastic gradients decreases (increases). The resulting LR regime shows its efficiency, robustness, and scalability in different classical stochastic algorithms, containing SGD, SGDM, and SIGNSGD, on machine learning tasks.