AITopics | spiderboost

Collaborating Authors

spiderboost

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Variance Reduction with Sparse Gradients

Elibol, Melih, Lei, Lihua, Jordan, Michael I.

arXiv.org Machine LearningJan-27-2020

A BSTRACT V ariance reduction methods such as SVRG (Johnson & Zhang, 2013) and SpiderBoost (Wang et al., 2018) use a mixture of large and small batch gradients to reduce the variance of stochastic gradients. Compared to SGD (Robbins & Monro, 1951), these methods require at least double the number of operations per update to model parameters. To reduce the computational cost of these methods, we introduce a new sparsity operator: The random-top- k operator. Our operator reduces computational complexity by estimating gradient sparsity exhibited in a variety of applications by combining the top-k operator (Stich et al., 2018; Aji & Heafield, 2017) and the randomized coordinate descent operator. With this operator, large batch gradients offer an extra benefit beyond variance reduction: A reliable estimate of gradient sparsity. Theoretically, our algorithm is at least as good as the best algorithm (SpiderBoost), and further excels in performance whenever the random-top- k operator captures gradient sparsity. Empirically, our algorithm consistently outperforms SpiderBoost using various models on various tasks including image classification, natural language processing, and sparse matrix factorization. We also provide empirical evidence to support the intuition behind our algorithm via a simple gradient entropy computation, which serves to quantify gradient sparsity at every iteration. It updates the iterate x with x η f I(x), where η is the learning rate and f I(x) is the batch stochastic gradient, i.e. f I(x) 1 I null i I f i(x).

algorithm, operator, spiderboost, (17 more...)

arXiv.org Machine Learning

2001.09623

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.87)

Add feedback

SpiderBoost: A Class of Faster Variance-reduced Algorithms for Nonconvex Optimization

Wang, Zhe, Ji, Kaiyi, Zhou, Yi, Liang, Yingbin, Tarokh, Vahid

arXiv.org Machine LearningOct-24-2018

There has been extensive research on developing stochastic variance reduced methods to solve large-scale optimization problems. More recently, a novel algorithm of such a type named SPIDER has been developed in \cite{Fang2018}, which was shown to outperform existing algorithms of the same type and meet the lower bound in certain regimes. Though interesting in theory, SPIDER requires $\epsilon$-level stepsize to guarantee the convergence, and consequently runs slow in practice. This paper proposes SpiderBoost as an improved SPIDER scheme, which comes with two major advantages compared to SPIDER. First, it allows much larger stepsize without sacrificing the convergence rate, and hence runs substantially faster than SPIDER in practice. Second, it extends much more easily to proximal algorithms with guaranteed convergence for solving composite optimization problems, which appears challenging for SPIDER due to stringent requirement on per-iteration increment to guarantee its convergence. Both advantages can be attributed to the new convergence analysis we develop for SpiderBoost that allows much more flexibility for choosing algorithm parameters. As further generalization of SpiderBoost, we show that proximal SpiderBoost achieves a stochastic first-order oracle (SFO) complexity of $\mathcal{O}(\min\{n^{1/2}\epsilon^{-1},\epsilon^{-3/2}\})$ for composite optimization, which improves the existing best results by a factor of $\mathcal{O}(\min\{n^{1/6},\epsilon^{-1/6}\})$.

artificial intelligence, machine learning, optimization, (16 more...)

arXiv.org Machine Learning

1810.1069

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Add feedback