Goto

Collaborating Authors

 Gradient Descent



Achieving Near-Optimal Convergence for Distributed Minimax Optimization with Adaptive Stepsizes

Neural Information Processing Systems

Sharma et al. (2022) provide Y ang et al. (2022a) integrate Local SGDA with stochastic gradient estimators to eliminate the More recently, Zhang et al. (2023) adopt compressed momentum methods with Local SGD to increase the communication efficiency of the algorithm. For centralized nonconvex minimax problems, Y ang et al. (2022b) show that, even in deterministic settings, GDA-based methods necessitate the timescale separation of the stepsizes for primal and dual updates.






Accelerating SGD for Highly Ill-Conditioned Huge-Scale Online Matrix Completion

Neural Information Processing Systems

The matrix completion problem seeks to recover a d d ground truth matrix of low rank r d from observations of its individual elements. Real-world matrix completion is often a huge-scale optimization problem, with d so large that even the simplest full-dimension vector operations with O ( d) time complexity become prohibitively expensive. Stochastic gradient descent (SGD) is one of the few algorithms capable of solving matrix completion on a huge scale, and can also naturally handle streaming data over an evolving ground truth. Unfortunately, SGD experiences a dramatic slow-down when the underlying ground truth is ill-conditioned; it requires at least O ( ฮบ log(1 /ฯต)) iterations to get ฯต -close to ground truth matrix with condition number ฮบ. In this paper, we propose a preconditioned version of SGD that preserves all the favorable practical qualities of SGD for huge-scale online optimization while also making it agnostic to ฮบ. For a symmetric ground truth and the Root Mean Square Error (RMSE) loss, we prove that the preconditioned SGD converges to ฯต -accuracy in O (log(1 /ฯต)) iterations, with a rapid linear convergence rate as if the ground truth were perfectly conditioned with ฮบ = 1 . In our experiments, we observe a similar acceleration for item-item collaborative filtering on the MovieLens25M dataset via a pair-wise ranking loss, with 100 million training pairs and 10 million testing pairs.


A Sampling using Flows

Neural Information Processing Systems

Neural transport augmented samplers have been subsequently extended by Hoffman et al. (2019) While, Duncan et al. (2019) have studied the Another contribution of this paper is learning equivariant Energy-Based Models using equivariant Stein variational gradient descent. Energy Based Models have witnessed a revival recently. As far as the authors are aware. Figure 8: Recommended to view in color . Translucent yellow dots represent the distribution.