gradient cost
SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path-Integrated Differential Estimator
Cong Fang, Chris Junchi Li, Zhouchen Lin, Tong Zhang
We provide a few error-bound results on its convergence rates. Specially, we prove that theSPIDER-SFO algorithm achieves a gradient computation cost of O min(n1/2 2, 3) to find an -approximate first-order stationary point. In addition, we prove thatSPIDER-SFO nearly matches the algorithmic lower bound for finding stationary point under the gradient Lipschitz assumption in the finite-sum setting.
- Asia > Middle East > Jordan (0.05)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (2 more...)
- Asia > Middle East > Jordan (0.05)
- Asia > China (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- (3 more...)
D-SPIDER-SFO: A Decentralized Optimization Algorithm with Faster Convergence Rate for Nonconvex Problems
Pan, Taoxing, Liu, Jun, Wang, Jie
Decentralized optimization algorithms have attracted intensive interests recently, as it has a balanced communication pattern, especially when solving large-scale machine learning problems. Stochastic Path Integrated Differential Estimator Stochastic First-Order method (SPIDER-SFO) nearly achieves the algorithmic lower bound in certain regimes for nonconvex problems. However, whether we can find a decentralized algorithm which achieves a similar convergence rate to SPIDER-SFO is still unclear. To tackle this problem, we propose a decentralized variant of SPIDER-SFO, called decentralized SPIDER-SFO (D-SPIDER-SFO). We show that D-SPIDER-SFO achieves a similar gradient computation cost-- that is, O ( null 3) for finding an null -approximate first-order stationary point--to its centralized counterpart. To the best of our knowledge, D-SPIDER-SFO achieves the state-of-the-art performance for solving nonconvex optimization problems on decentralized networks in terms of the computational cost. Experiments on different network configurations demonstrate the efficiency of the proposed method. Introduction Distributed optimization is a popular technique for solving large scale machine learning problems Li et al. (2014), ranging from visual object recognition Huang et al. (2017); He et al. (2016) to natural language processing V aswani et al. (2017); Devlin et al. (2019). For distributed optimization, a set of workers form a connected computational network, and each worker is assigned a portion of the computing task. The centralized network topology, like parameter server Jianmin et al. (2016); Dean et al. (2012); Li et al. (2014); Zinkevich et al. (2010), consists of a central worker connected with all other workers.
- Asia > Middle East > Jordan (0.04)
- Asia > China (0.04)
- Education > Focused Education > Special Education (0.44)
- Information Technology (0.34)
SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path-Integrated Differential Estimator
Fang, Cong, Li, Chris Junchi, Lin, Zhouchen, Zhang, Tong
In this paper, we propose a new technique named \textit{Stochastic Path-Integrated Differential EstimatoR} (SPIDER), which can be used to track many deterministic quantities of interests with significantly reduced computational cost. Combining SPIDER with the method of normalized gradient descent, we propose SPIDER-SFO that solve non-convex stochastic optimization problems using stochastic gradients only. We provide a few error-bound results on its convergence rates. Specially, we prove that the SPIDER-SFO algorithm achieves a gradient computation cost of $\mathcal{O}\left( \min( n^{1/2} \epsilon^{-2}, \epsilon^{-3} ) \right)$ to find an $\epsilon$-approximate first-order stationary point. In addition, we prove that SPIDER-SFO nearly matches the algorithmic lower bound for finding stationary point under the gradient Lipschitz assumption in the finite-sum setting. Our SPIDER technique can be further applied to find an $(\epsilon, \mathcal{O}(\ep^{0.5}))$-approximate second-order stationary point at a gradient computation cost of $\tilde{\mathcal{O}}\left( \min( n^{1/2} \epsilon^{-2}+\epsilon^{-2.5}, \epsilon^{-3} ) \right)$.
- Asia > Middle East > Jordan (0.05)
- Asia > China (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- (3 more...)
SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path-Integrated Differential Estimator
Fang, Cong, Li, Chris Junchi, Lin, Zhouchen, Zhang, Tong
In this paper, we propose a new technique named \textit{Stochastic Path-Integrated Differential EstimatoR} (SPIDER), which can be used to track many deterministic quantities of interests with significantly reduced computational cost. Combining SPIDER with the method of normalized gradient descent, we propose SPIDER-SFO that solve non-convex stochastic optimization problems using stochastic gradients only. We provide a few error-bound results on its convergence rates. Specially, we prove that the SPIDER-SFO algorithm achieves a gradient computation cost of $\mathcal{O}\left( \min( n^{1/2} \epsilon^{-2}, \epsilon^{-3} ) \right)$ to find an $\epsilon$-approximate first-order stationary point. In addition, we prove that SPIDER-SFO nearly matches the algorithmic lower bound for finding stationary point under the gradient Lipschitz assumption in the finite-sum setting. Our SPIDER technique can be further applied to find an $(\epsilon, \mathcal{O}(\ep^{0.5}))$-approximate second-order stationary point at a gradient computation cost of $\tilde{\mathcal{O}}\left( \min( n^{1/2} \epsilon^{-2}+\epsilon^{-2.5}, \epsilon^{-3} ) \right)$.
- Asia > Middle East > Jordan (0.05)
- Asia > China (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- (3 more...)
SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator
Fang, Cong, Li, Chris Junchi, Lin, Zhouchen, Zhang, Tong
In this paper, we propose a new technique named Stochastic Path-Integrated Differential EstimatoR (SPIDER), which can be used to track many deterministic quantities of interest with significantly reduced computational cost. Combining SPIDER with the method of normalized gradient descent, we propose two new algorithms, namely SPIDER-SFO and SPIDER-SSO, that solve non-convex stochastic optimization problems using stochastic gradients only. We provide sharp error-bound results on their convergence rates. Specially, we prove that the SPIDER-SFO and SPIDER-SSO algorithms achieve a record-breaking $\tilde{O}(\epsilon^{-3})$ gradient computation cost to find an $\epsilon$-approximate first-order and $(\epsilon, O(\epsilon^{0.5}))$-approximate second-order stationary point, respectively. In addition, we prove that SPIDER-SFO nearly matches the algorithmic lower bound for finding stationary point under the gradient Lipschitz assumption in the finite-sum setting.
- Asia > Middle East > Jordan (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Spain > Canary Islands (0.04)