AITopics | bb step size

Collaborating Authors

bb step size

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Barzilai-Borwein Step Size for Stochastic Gradient Descent

Conghui Tan, Shiqian Ma, Yu-Hong Dai, Yuqiu Qian

Neural Information Processing SystemsNov-21-2025, 09:51:38 GMT

SGD that can reduce the variance and improve the complexity.

artificial intelligence, machine learning, step size, (15 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.05)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Asia > China > Beijing > Beijing (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.90)

Add feedback

Barzilai-Borwein Step Size for Stochastic Gradient Descent

Neural Information Processing SystemsMar-12-2024, 17:29:49 GMT

One of the major issues in stochastic gradient descent (SGD) methods is how to choose an appropriate step size while running the algorithm. Since the traditional line search technique does not apply for stochastic optimization methods, the common practice in SGD is either to use a diminishing step size, or to tune a step size by hand, which can be time consuming in practice. In this paper, we propose to use the Barzilai-Borwein (BB) method to automatically compute step sizes for SGD and its variant: stochastic variance reduced gradient (SVRG) method, which leads to two algorithms: SGD-BB and SVRG-BB. We prove that SVRG-BB converges linearly for strongly convex objective functions. As a by-product, we prove the linear convergence result of SVRG with Option I proposed in [10], whose convergence result has been missing in the literature. Numerical experiments on standard data sets show that the performance of SGD-BB and SVRG-BB is comparable to and sometimes even better than SGD and SVRG with best-tuned step sizes, and is superior to some advanced SGD variants.

gradient, step size, svrg-bb, (13 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.05)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Asia > China > Beijing > Beijing (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Fast Stochastic Ordinal Embedding with Variance Reduction and Adaptive Step Size

Ma, Ke, Zeng, Jinshan, Xu, Qianqian, Cao, Xiaochun, Liu, Wei, Yao, Yuan

arXiv.org Machine LearningDec-1-2019

Most of the existing methods are based on semi-definite programming ( SDP), which is generally time-consuming and degrades the scalability, especially confronting large-scale data. T o overcome this challenge, we propose a stochastic algorithm called SVRG-SBB, which has the following features: i) achieving good scalability via dropping positive semi-definite ( PSD) constraints as serving a fast algorithm, i.e., stochastic variance reduced gradient ( SVRG) method, and ii) adaptive learning via introducing a new, adaptive step size called the stabilized Barzilai-Borwein ( SBB) step size. Theoretically, under some natural assumptions, we show the O ( 1 T) rate of convergence to a stationary point of the proposed algorithm, where T is the number of total iterations. Under the further Polyak- Łojasiewicz assumption, we can show the global linear convergence (i.e., exponentially fast converging to a global optimum) of the proposed algorithm. Numerous simulations and real-world data experiments are conducted to show the effectiveness of the proposed algorithm by comparing with the state-of-the-art methods, notably, much lower computational cost with good prediction performance. Index Terms --Ordinal Embedding, SVRG, Non-Convex Optimization, Barzilai-Borwein (BB) Step Size, .null 1 I NTRODUCTION O RDINAL embedding aims to learn the representation of data as points in a low-dimensional embedded space. Here the "low-dimensional" means the embedding-K. Ma is with the School of Computer Science and T echnology, University of Chinese Academy of Sciences, Beijing 100049, China, and with the Artificial Intelligence Research Center, Peng Cheng Laboratory, Shenzhen 518055, China, and part of this work was performed when he was in the Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China, and in the School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China. Email: make@ucas.ac.cn - J. Zeng is with the School of Computer Information Engineering, Jiangxi Normal University, Nanchang, Jiangxi 330022, China, and part of this work was performed when he was with the Department of Mathematics, Hong Kong University of Science and T echnology, Clear Water Bay, Kowloon, Hong Kong.

ncvx svrg-sbb, ordinal, step size, (16 more...)

arXiv.org Machine Learning

doi: 10.1109/TKDE.2019.2956700

1912.00362

Country:

Asia > China > Beijing > Beijing (0.64)
Asia > China > Guangdong Province > Shenzhen (0.24)
Asia > China > Hong Kong > Kowloon (0.24)
(3 more...)

Genre: Research Report (0.84)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Adaptive Step Sizes in Variance Reduction via Regularization

Li, Bingcong, Giannakis, Georgios B.

arXiv.org Machine LearningOct-15-2019

The main goal of this work is equipping convex and nonconvex problems with Barzilai-Borwein (BB) step size. With the adaptivity of BB step sizes granted, they can fail when the objective function is not strongly convex. To overcome this challenge, the key idea here is to bridge (non)convex problems and strongly convex ones via regularization. The proposed regularization schemes are \textit{simple} yet effective. Wedding the BB step size with a variance reduction method, known as SARAH, offers a free lunch compared with vanilla SARAH in convex problems. The convergence of BB step sizes in nonconvex problems is also established and its complexity is no worse than other adaptive step sizes such as AdaGrad. As a byproduct, our regularized SARAH methods for convex functions ensure that the complexity to find $\mathbb{E}[\| \nabla f(\mathbf{x}) \|^2]\leq \epsilon$ is ${\cal O}\big( (n+\frac{1}{\sqrt{\epsilon}})\ln{\frac{1}{\epsilon}}\big)$, improving $\epsilon$ dependence over existing results. Numerical tests further validate the merits of proposed approaches.

bb step size, complexity, step size, (13 more...)

arXiv.org Machine Learning

1910.06532

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Quebec > Montreal (0.04)
North America > United States > Nevada (0.04)
(3 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

Almost Tune-Free Variance Reduction

Li, Bingcong, Wang, Lingda, Giannakis, Georgios B.

arXiv.org Machine LearningAug-25-2019

The variance reduction class of algorithms including the representative ones, abbreviated as SVRG and SARAH, have well documented merits for empirical risk minimization tasks. However, they require grid search to optimally tune parameters (step size and the number of iterations per inner loop) for best performance. This work introduces `almost tune-free' SVRG and SARAH schemes by equipping them with Barzilai-Borwein (BB) step sizes. To achieve the best performance, both i) averaging schemes; and, ii) the inner loop length are adjusted according to the BB step size. SVRG and SARAH are first reexamined through an `estimate sequence' lens. Such analysis provides new averaging methods that tighten the convergence rates of both SVRG and SARAH theoretically, and improve their performance empirically when the step size is chosen large. Then a simple yet effective means of adjusting the number of iterations per inner loop is developed, which completes the tune-free variance reduction together with BB step sizes. Numerical tests corroborate the proposed methods.

artificial intelligence, machine learning, step size, (17 more...)

arXiv.org Machine Learning

1908.09345

Country: North America > United States > Minnesota (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Stochastic Non-Convex Ordinal Embedding With Stabilized Barzilai-Borwein Step Size

Ma, Ke (Institute of Information Engineering, Chinese Academy of Sciences) | Zeng, Jinshan (School of Cyber Security, University of Chinese Academy of Sciences) | Xiong, Jiechao (School of Computer Information Engineering, Jiangxi Normal University) | Xu, Qianqian (Hong Kong University of Science and Technology) | Cao, Xiaochun (Tencent AI Lab) | Liu, Wei (Institute of Information Engineering, Chinese Academy of Sciences) | Yao, Yuan (Institute of Information Engineering, Chinese Academy of Sciences)

AAAI ConferencesFeb-8-2018

Learning representation from relative similarity comparisons, often called ordinal embedding, gains rising attention in recent years. Most of the existing methods are batch methods designed mainly based on the convex optimization, say, the projected gradient descent method. However, they are generally time-consuming due to that the singular value decomposition (SVD) is commonly adopted during the update, especially when the data size is very large. To overcome this challenge, we propose a stochastic algorithm called SVRG-SBB, which has the following features: (a) SVD-free via dropping convexity, with good scalability by the use of stochastic algorithm, i.e., stochastic variance reduced gradient (SVRG), and (b) adaptive step size choice via introducing a new stabilized Barzilai-Borwein (SBB) method as the original version for convex problems might fail for the considered stochastic non-convex optimization problem. Moreover, we show that the proposed algorithm converges to a stationary point at a rate O (1/ T ) in our setting, where T is the number of total iterations. Numerous simulations and real-world data experiments are conducted to show the effectiveness of the proposed algorithm via comparing with the state-of-the-art methods, particularly, much lower computational cost with good prediction performance.

artificial intelligence, machine learning, step size, (17 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: Asia > China (0.14)

Genre: Research Report (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.37)

Add feedback

Stochastic Non-convex Ordinal Embedding with Stabilized Barzilai-Borwein Step Size

Ma, Ke, Zeng, Jinshan, Xiong, Jiechao, Xu, Qianqian, Cao, Xiaochun, Liu, Wei, Yao, Yuan

arXiv.org Machine LearningJan-30-2018

Learning representation from relative similarity comparisons, often called ordinal embedding, gains rising attention in recent years. Most of the existing methods are batch methods designed mainly based on the convex optimization, say, the projected gradient descent method. However, they are generally time-consuming due to that the singular value decomposition (SVD) is commonly adopted during the update, especially when the data size is very large. To overcome this challenge, we propose a stochastic algorithm called SVRG-SBB, which has the following features: (a) SVD-free via dropping convexity, with good scalability by the use of stochastic algorithm, i.e., stochastic variance reduced gradient (SVRG), and (b) adaptive step size choice via introducing a new stabilized Barzilai-Borwein (SBB) method as the original version for convex problems might fail for the considered stochastic \textit{non-convex} optimization problem. Moreover, we show that the proposed algorithm converges to a stationary point at a rate $\mathcal{O}(\frac{1}{T})$ in our setting, where $T$ is the number of total iterations. Numerous simulations and real-world data experiments are conducted to show the effectiveness of the proposed algorithm via comparing with the state-of-the-art methods, particularly, much lower computational cost with good prediction performance.

artificial intelligence, machine learning, step size, (16 more...)

arXiv.org Machine Learning

1711.06446

Country: Asia > China (0.14)

Genre: Research Report (0.70)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)

Add feedback

Barzilai-Borwein Step Size for Stochastic Gradient Descent

Tan, Conghui, Ma, Shiqian, Dai, Yu-Hong, Qian, Yuqiu

Neural Information Processing SystemsDec-31-2016

gradient, step size, svrg-bb, (13 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.05)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Asia > China > Beijing > Beijing (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Barzilai-Borwein Step Size for Stochastic Gradient Descent

Tan, Conghui, Ma, Shiqian, Dai, Yu-Hong, Qian, Yuqiu

arXiv.org Machine LearningMay-22-2016

One of the major issues in stochastic gradient descent (SGD) methods is how to choose an appropriate step size while running the algorithm. Since the traditional line search technique does not apply for stochastic optimization algorithms, the common practice in SGD is either to use a diminishing step size, or to tune a fixed step size by hand, which can be time consuming in practice. In this paper, we propose to use the Barzilai-Borwein (BB) method to automatically compute step sizes for SGD and its variant: stochastic variance reduced gradient (SVRG) method, which leads to two algorithms: SGD-BB and SVRG-BB. We prove that SVRG-BB converges linearly for strongly convex objective functions. As a by-product, we prove the linear convergence result of SVRG with Option I proposed in [10], whose convergence result is missing in the literature. Numerical experiments on standard data sets show that the performance of SGD-BB and SVRG-BB is comparable to and sometimes even better than SGD and SVRG with best-tuned step sizes, and is superior to some advanced SGD variants.

artificial intelligence, machine learning, step size, (15 more...)

arXiv.org Machine Learning

1605.04131

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback