Decomposable Non-Smooth Convex Optimization with Nearly-Linear Gradient Oracle Complexity
Dong, Sally, Jiang, Haotian, Lee, Yin Tat, Padmanabhan, Swati, Ye, Guanghao
–arXiv.org Artificial Intelligence
Many fundamental problems in machine learning can be formulated by the convex program \[ \min_{\theta\in R^d}\ \sum_{i=1}^{n}f_{i}(\theta), \] where each $f_i$ is a convex, Lipschitz function supported on a subset of $d_i$ coordinates of $\theta$. One common approach to this problem, exemplified by stochastic gradient descent, involves sampling one $f_i$ term at every iteration to make progress. This approach crucially relies on a notion of uniformity across the $f_i$'s, formally captured by their condition number. In this work, we give an algorithm that minimizes the above convex formulation to $\epsilon$-accuracy in $\widetilde{O}(\sum_{i=1}^n d_i \log (1 /\epsilon))$ gradient computations, with no assumptions on the condition number. The previous best algorithm independent of the condition number is the standard cutting plane method, which requires $O(nd \log (1/\epsilon))$ gradient computations. As a corollary, we improve upon the evaluation oracle complexity for decomposable submodular minimization by Axiotis et al. (ICML 2021). Our main technical contribution is an adaptive procedure to select an $f_i$ term at every iteration via a novel combination of cutting-plane and interior-point methods.
arXiv.org Artificial Intelligence
Aug-7-2022
- Country:
- Asia > Russia (0.04)
- North America > United States
- Washington > King County
- Seattle (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Washington > King County
- Europe
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Russia > Central Federal District
- Moscow Oblast > Moscow (0.04)
- United Kingdom > England
- Genre:
- Research Report (0.64)
- Technology: