Goto

Collaborating Authors

 catalyst framework




A Catalyst Framework for Minimax Optimization

Neural Information Processing Systems

We introduce a generic \emph{two-loop} scheme for smooth minimax optimization with strongly-convex-concave objectives.



Review for NeurIPS paper: A Catalyst Framework for Minimax Optimization

Neural Information Processing Systems

Additional Feedback: Questions in random ordering: - Would it be possible to provide dependences on the diameter(s) D_Y (and D_X?) in Table 1? - Reference for point (ii) page 3? - line 147: although this additional evaluation is certainly "negligible" for deterministic methods, is it really the case for stochastic ones? Was this cost taken into account in the numerical experiments? I guess there should be no gain (due to lower bound & EG), but e.g., do we also lose the logarithmic factor? If not, please make it more explicit (e.g., in the abstract; "state-of-the-art" makes it a bit implicit) To go further: - Is it possible to use the method with raw estimates of mu and/or l? - (lines 42-54): Given that there is no known optimal algorithm; is it possible that the lower bound is not tight? In particular, in the abstract, the word "first" is probably a bit abusive, given that there exists closely related methods for closely related settings (e.g., [40]).


Review for NeurIPS paper: A Catalyst Framework for Minimax Optimization

Neural Information Processing Systems

The paper received positive feedback. After reading the rebuttal and discussing the paper, the general consensus is that the paper should be accepted. The area chair agrees with this assessement and follows the reviewer's recommendation. Several suggestions were made to improve the paper (see in particular R1's review), which will be good to take into account for the final version.


A Catalyst Framework for Minimax Optimization

Neural Information Processing Systems

We introduce a generic \emph{two-loop} scheme for smooth minimax optimization with strongly-convex-concave objectives. Despite its simplicity, this leads to a family of near-optimal algorithms with improved complexity over all existing methods designed for strongly-convex-concave minimax problems. Additionally, we obtain the first variance-reduced algorithms for this class of minimax problems with finite-sum structure and establish even faster convergence rate. Furthermore, when extended to the nonconvex-concave minimax optimization, our algorithm again achieves the state-of-the-art complexity for finding a stationary point. We carry out several numerical experiments showcasing the superiority of the Catalyst framework in practice.


Guide To Catalyst - A PyTorch Framework For Accelerated Deep Learning - Analytics India Magazine

#artificialintelligence

Catalyst is a PyTorch framework developed with the intent of advancing research and development in the domain of deep learning. It enables code reusability, reproducibility and rapid experimentation so that users can conveniently create deep learning models and pipelines without writing another training loop. Catalyst framework is part of the PyTorch ecosystem – a collection of numerous tools and libraries for AI development. It is also a part of the Catalyst Ecosystem – an MLOps ecosystem that expedites training, analysis and deployment of deep learning experiments through Catalyst, Alchemy and Reaction frameworks respectively. We have used the well-known MNIST dataset having 10 output classes (for classifying images of handwritten digits from 0 to 9).


A Distributed Quasi-Newton Algorithm for Primal and Dual Regularized Empirical Risk Minimization

Lee, Ching-pei, Lim, Cong Han, Wright, Stephen J.

arXiv.org Machine Learning

We propose a communication- and computation-efficient distributed optimization algorithm using second-order information for solving empirical risk minimization (ERM) problems with a nonsmooth regularization term. Our algorithm is applicable to both the primal and the dual ERM problem. Current second-order and quasi-Newton methods for this problem either do not work well in the distributed setting or work only for specific regularizers. Our algorithm uses successive quadratic approximations of the smooth part, and we describe how to maintain an approximation of the (generalized) Hessian and solve subproblems efficiently in a distributed manner. When applied to the distributed dual ERM problem, unlike state of the art that takes only the block-diagonal part of the Hessian, our approach is able to utilize global curvature information and is thus magnitudes faster. The proposed method enjoys global linear convergence for a broad range of non-strongly convex problems that includes the most commonly used ERMs, thus requiring lower communication complexity. It also converges on non-convex problems, so has the potential to be used on applications such as deep learning. Computational results demonstrate that our method significantly improves on communication cost and running time over the current state-of-the-art methods.