AITopics | gradient cost

We provide a few error-bound results on its convergence rates. Specially, we prove that theSPIDER-SFO algorithm achieves a gradient computation cost of O min(n1/2 2, 3) to find an -approximate first-order stationary point. In addition, we prove thatSPIDER-SFO nearly matches the algorithmic lower bound for finding stationary point under the gradient Lipschitz assumption in the finite-sum setting.

artificial intelligence, machine learning, stationary point, (17 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.05)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > Canada > Quebec > Montreal (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.30)

Add feedback

SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path-Integrated Differential Estimator

Cong Fang, Chris Junchi Li, Zhouchen Lin, Tong Zhang

Neural Information Processing SystemsNov-20-2025, 14:37:35 GMT

Neural Information Processing Systems http://nips.cc/

gradient cost, optimization, stationary point, (13 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.05)
Asia > China (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

D-SPIDER-SFO: A Decentralized Optimization Algorithm with Faster Convergence Rate for Nonconvex Problems

Pan, Taoxing, Liu, Jun, Wang, Jie

arXiv.org Machine LearningNov-28-2019

Decentralized optimization algorithms have attracted intensive interests recently, as it has a balanced communication pattern, especially when solving large-scale machine learning problems. Stochastic Path Integrated Differential Estimator Stochastic First-Order method (SPIDER-SFO) nearly achieves the algorithmic lower bound in certain regimes for nonconvex problems. However, whether we can find a decentralized algorithm which achieves a similar convergence rate to SPIDER-SFO is still unclear. To tackle this problem, we propose a decentralized variant of SPIDER-SFO, called decentralized SPIDER-SFO (D-SPIDER-SFO). We show that D-SPIDER-SFO achieves a similar gradient computation cost-- that is, O ( null 3) for finding an null -approximate first-order stationary point--to its centralized counterpart. To the best of our knowledge, D-SPIDER-SFO achieves the state-of-the-art performance for solving nonconvex optimization problems on decentralized networks in terms of the computational cost. Experiments on different network configurations demonstrate the efficiency of the proposed method. Introduction Distributed optimization is a popular technique for solving large scale machine learning problems Li et al. (2014), ranging from visual object recognition Huang et al. (2017); He et al. (2016) to natural language processing V aswani et al. (2017); Devlin et al. (2019). For distributed optimization, a set of workers form a connected computational network, and each worker is assigned a portion of the computing task. The centralized network topology, like parameter server Jianmin et al. (2016); Dean et al. (2012); Li et al. (2014); Zinkevich et al. (2010), consists of a central worker connected with all other workers.

algorithm, d-spider-sfo, inull 2, (15 more...)

arXiv.org Machine Learning

1911.12665

Country:

Asia > Middle East > Jordan (0.04)
Asia > China (0.04)

Genre: Research Report (0.81)

Industry:

Education > Focused Education > Special Education (0.44)
Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path-Integrated Differential Estimator

Fang, Cong, Li, Chris Junchi, Lin, Zhouchen, Zhang, Tong

Neural Information Processing SystemsDec-31-2018

In this paper, we propose a new technique named \textit{Stochastic Path-Integrated Differential EstimatoR} (SPIDER), which can be used to track many deterministic quantities of interests with significantly reduced computational cost. Combining SPIDER with the method of normalized gradient descent, we propose SPIDER-SFO that solve non-convex stochastic optimization problems using stochastic gradients only. We provide a few error-bound results on its convergence rates. Specially, we prove that the SPIDER-SFO algorithm achieves a gradient computation cost of $\mathcal{O}\left( \min( n^{1/2} \epsilon^{-2}, \epsilon^{-3} ) \right)$ to find an $\epsilon$-approximate first-order stationary point. In addition, we prove that SPIDER-SFO nearly matches the algorithmic lower bound for finding stationary point under the gradient Lipschitz assumption in the finite-sum setting. Our SPIDER technique can be further applied to find an $(\epsilon, \mathcal{O}(\ep^{0.5}))$-approximate second-order stationary point at a gradient computation cost of $\tilde{\mathcal{O}}\left( \min( n^{1/2} \epsilon^{-2}+\epsilon^{-2.5}, \epsilon^{-3} ) \right)$.

artificial intelligence, machine learning, stationary point, (15 more...)

Neural Information Processing Systems

Country: Asia (0.29)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.74)

Add feedback

SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path-Integrated Differential Estimator

Fang, Cong, Li, Chris Junchi, Lin, Zhouchen, Zhang, Tong

Neural Information Processing SystemsDec-31-2018

In this paper, we propose a new technique named \textit{Stochastic Path-Integrated Differential EstimatoR} (SPIDER), which can be used to track many deterministic quantities of interests with significantly reduced computational cost. Combining SPIDER with the method of normalized gradient descent, we propose SPIDER-SFO that solve non-convex stochastic optimization problems using stochastic gradients only. We provide a few error-bound results on its convergence rates. Specially, we prove that the SPIDER-SFO algorithm achieves a gradient computation cost of $\mathcal{O}\left( \min( n^{1/2} \epsilon^{-2}, \epsilon^{-3} ) \right)$ to find an $\epsilon$-approximate first-order stationary point. In addition, we prove that SPIDER-SFO nearly matches the algorithmic lower bound for finding stationary point under the gradient Lipschitz assumption in the finite-sum setting. Our SPIDER technique can be further applied to find an $(\epsilon, \mathcal{O}(\ep^{0.5}))$-approximate second-order stationary point at a gradient computation cost of $\tilde{\mathcal{O}}\left( \min( n^{1/2} \epsilon^{-2}+\epsilon^{-2.5}, \epsilon^{-3} ) \right)$.

artificial intelligence, machine learning, stationary point, (15 more...)

Neural Information Processing Systems

Country: Asia (0.29)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.74)

Add feedback

SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator

Fang, Cong, Li, Chris Junchi, Lin, Zhouchen, Zhang, Tong

arXiv.org Machine LearningJul-4-2018

In this paper, we propose a new technique named Stochastic Path-Integrated Differential EstimatoR (SPIDER), which can be used to track many deterministic quantities of interest with significantly reduced computational cost. Combining SPIDER with the method of normalized gradient descent, we propose two new algorithms, namely SPIDER-SFO and SPIDER-SSO, that solve non-convex stochastic optimization problems using stochastic gradients only. We provide sharp error-bound results on their convergence rates. Specially, we prove that the SPIDER-SFO and SPIDER-SSO algorithms achieve a record-breaking $\tilde{O}(\epsilon^{-3})$ gradient computation cost to find an $\epsilon$-approximate first-order and $(\epsilon, O(\epsilon^{0.5}))$-approximate second-order stationary point, respectively. In addition, we prove that SPIDER-SFO nearly matches the algorithmic lower bound for finding stationary point under the gradient Lipschitz assumption in the finite-sum setting.

artificial intelligence, gradient cost, machine learning, (15 more...)

arXiv.org Machine Learning

1807.01695

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.58)

Add feedback

Filters

Collaborating Authors

gradient cost

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path-Integrated Differential Estimator

SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path-Integrated Differential Estimator

D-SPIDER-SFO: A Decentralized Optimization Algorithm with Faster Convergence Rate for Nonconvex Problems

SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path-Integrated Differential Estimator

SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path-Integrated Differential Estimator

SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator