bb step size
Barzilai-Borwein Step Size for Stochastic Gradient Descent
One of the major issues in stochastic gradient descent (SGD) methods is how to choose an appropriate step size while running the algorithm. Since the traditional line search technique does not apply for stochastic optimization methods, the common practice in SGD is either to use a diminishing step size, or to tune a step size by hand, which can be time consuming in practice. In this paper, we propose to use the Barzilai-Borwein (BB) method to automatically compute step sizes for SGD and its variant: stochastic variance reduced gradient (SVRG) method, which leads to two algorithms: SGD-BB and SVRG-BB. We prove that SVRG-BB converges linearly for strongly convex objective functions. As a by-product, we prove the linear convergence result of SVRG with Option I proposed in [10], whose convergence result has been missing in the literature. Numerical experiments on standard data sets show that the performance of SGD-BB and SVRG-BB is comparable to and sometimes even better than SGD and SVRG with best-tuned step sizes, and is superior to some advanced SGD variants.
Fast Stochastic Ordinal Embedding with Variance Reduction and Adaptive Step Size
Ma, Ke, Zeng, Jinshan, Xu, Qianqian, Cao, Xiaochun, Liu, Wei, Yao, Yuan
Most of the existing methods are based on semi-definite programming ( SDP), which is generally time-consuming and degrades the scalability, especially confronting large-scale data. T o overcome this challenge, we propose a stochastic algorithm called SVRG-SBB, which has the following features: i) achieving good scalability via dropping positive semi-definite ( PSD) constraints as serving a fast algorithm, i.e., stochastic variance reduced gradient ( SVRG) method, and ii) adaptive learning via introducing a new, adaptive step size called the stabilized Barzilai-Borwein ( SBB) step size. Theoretically, under some natural assumptions, we show the O ( 1 T) rate of convergence to a stationary point of the proposed algorithm, where T is the number of total iterations. Under the further Polyak- ลojasiewicz assumption, we can show the global linear convergence (i.e., exponentially fast converging to a global optimum) of the proposed algorithm. Numerous simulations and real-world data experiments are conducted to show the effectiveness of the proposed algorithm by comparing with the state-of-the-art methods, notably, much lower computational cost with good prediction performance. Index Terms --Ordinal Embedding, SVRG, Non-Convex Optimization, Barzilai-Borwein (BB) Step Size, .null 1 I NTRODUCTION O RDINAL embedding aims to learn the representation of data as points in a low-dimensional embedded space. Here the "low-dimensional" means the embedding-K. Ma is with the School of Computer Science and T echnology, University of Chinese Academy of Sciences, Beijing 100049, China, and with the Artificial Intelligence Research Center, Peng Cheng Laboratory, Shenzhen 518055, China, and part of this work was performed when he was in the Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China, and in the School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China. Email: make@ucas.ac.cn - J. Zeng is with the School of Computer Information Engineering, Jiangxi Normal University, Nanchang, Jiangxi 330022, China, and part of this work was performed when he was with the Department of Mathematics, Hong Kong University of Science and T echnology, Clear Water Bay, Kowloon, Hong Kong.
Adaptive Step Sizes in Variance Reduction via Regularization
Li, Bingcong, Giannakis, Georgios B.
The main goal of this work is equipping convex and nonconvex problems with Barzilai-Borwein (BB) step size. With the adaptivity of BB step sizes granted, they can fail when the objective function is not strongly convex. To overcome this challenge, the key idea here is to bridge (non)convex problems and strongly convex ones via regularization. The proposed regularization schemes are \textit{simple} yet effective. Wedding the BB step size with a variance reduction method, known as SARAH, offers a free lunch compared with vanilla SARAH in convex problems. The convergence of BB step sizes in nonconvex problems is also established and its complexity is no worse than other adaptive step sizes such as AdaGrad. As a byproduct, our regularized SARAH methods for convex functions ensure that the complexity to find $\mathbb{E}[\| \nabla f(\mathbf{x}) \|^2]\leq \epsilon$ is ${\cal O}\big( (n+\frac{1}{\sqrt{\epsilon}})\ln{\frac{1}{\epsilon}}\big)$, improving $\epsilon$ dependence over existing results. Numerical tests further validate the merits of proposed approaches.
Almost Tune-Free Variance Reduction
Li, Bingcong, Wang, Lingda, Giannakis, Georgios B.
The variance reduction class of algorithms including the representative ones, abbreviated as SVRG and SARAH, have well documented merits for empirical risk minimization tasks. However, they require grid search to optimally tune parameters (step size and the number of iterations per inner loop) for best performance. This work introduces `almost tune-free' SVRG and SARAH schemes by equipping them with Barzilai-Borwein (BB) step sizes. To achieve the best performance, both i) averaging schemes; and, ii) the inner loop length are adjusted according to the BB step size. SVRG and SARAH are first reexamined through an `estimate sequence' lens. Such analysis provides new averaging methods that tighten the convergence rates of both SVRG and SARAH theoretically, and improve their performance empirically when the step size is chosen large. Then a simple yet effective means of adjusting the number of iterations per inner loop is developed, which completes the tune-free variance reduction together with BB step sizes. Numerical tests corroborate the proposed methods.
Stochastic Non-Convex Ordinal Embedding With Stabilized Barzilai-Borwein Step Size
Ma, Ke (Institute of Information Engineering, Chinese Academy of Sciences) | Zeng, Jinshan (School of Cyber Security, University of Chinese Academy of Sciences) | Xiong, Jiechao (School of Computer Information Engineering, Jiangxi Normal University) | Xu, Qianqian (Hong Kong University of Science and Technology) | Cao, Xiaochun (Tencent AI Lab) | Liu, Wei (Institute of Information Engineering, Chinese Academy of Sciences) | Yao, Yuan (Institute of Information Engineering, Chinese Academy of Sciences)
Learning representation from relative similarity comparisons, often called ordinal embedding, gains rising attention in recent years. Most of the existing methods are batch methods designed mainly based on the convex optimization, say, the projected gradient descent method. However, they are generally time-consuming due to that the singular value decomposition (SVD) is commonly adopted during the update, especially when the data size is very large. To overcome this challenge, we propose a stochastic algorithm called SVRG-SBB, which has the following features: (a) SVD-free via dropping convexity, with good scalability by the use of stochastic algorithm, i.e., stochastic variance reduced gradient (SVRG), and (b) adaptive step size choice via introducing a new stabilized Barzilai-Borwein (SBB) method as the original version for convex problems might fail for the considered stochastic non-convex optimization problem. Moreover, we show that the proposed algorithm converges to a stationary point at a rate O (1/ T ) in our setting, where T is the number of total iterations. Numerous simulations and real-world data experiments are conducted to show the effectiveness of the proposed algorithm via comparing with the state-of-the-art methods, particularly, much lower computational cost with good prediction performance.
Stochastic Non-convex Ordinal Embedding with Stabilized Barzilai-Borwein Step Size
Ma, Ke, Zeng, Jinshan, Xiong, Jiechao, Xu, Qianqian, Cao, Xiaochun, Liu, Wei, Yao, Yuan
Learning representation from relative similarity comparisons, often called ordinal embedding, gains rising attention in recent years. Most of the existing methods are batch methods designed mainly based on the convex optimization, say, the projected gradient descent method. However, they are generally time-consuming due to that the singular value decomposition (SVD) is commonly adopted during the update, especially when the data size is very large. To overcome this challenge, we propose a stochastic algorithm called SVRG-SBB, which has the following features: (a) SVD-free via dropping convexity, with good scalability by the use of stochastic algorithm, i.e., stochastic variance reduced gradient (SVRG), and (b) adaptive step size choice via introducing a new stabilized Barzilai-Borwein (SBB) method as the original version for convex problems might fail for the considered stochastic \textit{non-convex} optimization problem. Moreover, we show that the proposed algorithm converges to a stationary point at a rate $\mathcal{O}(\frac{1}{T})$ in our setting, where $T$ is the number of total iterations. Numerous simulations and real-world data experiments are conducted to show the effectiveness of the proposed algorithm via comparing with the state-of-the-art methods, particularly, much lower computational cost with good prediction performance.
Barzilai-Borwein Step Size for Stochastic Gradient Descent
Tan, Conghui, Ma, Shiqian, Dai, Yu-Hong, Qian, Yuqiu
One of the major issues in stochastic gradient descent (SGD) methods is how to choose an appropriate step size while running the algorithm. Since the traditional line search technique does not apply for stochastic optimization methods, the common practice in SGD is either to use a diminishing step size, or to tune a step size by hand, which can be time consuming in practice. In this paper, we propose to use the Barzilai-Borwein (BB) method to automatically compute step sizes for SGD and its variant: stochastic variance reduced gradient (SVRG) method, which leads to two algorithms: SGD-BB and SVRG-BB. We prove that SVRG-BB converges linearly for strongly convex objective functions. As a by-product, we prove the linear convergence result of SVRG with Option I proposed in [10], whose convergence result has been missing in the literature. Numerical experiments on standard data sets show that the performance of SGD-BB and SVRG-BB is comparable to and sometimes even better than SGD and SVRG with best-tuned step sizes, and is superior to some advanced SGD variants.
Barzilai-Borwein Step Size for Stochastic Gradient Descent
Tan, Conghui, Ma, Shiqian, Dai, Yu-Hong, Qian, Yuqiu
One of the major issues in stochastic gradient descent (SGD) methods is how to choose an appropriate step size while running the algorithm. Since the traditional line search technique does not apply for stochastic optimization algorithms, the common practice in SGD is either to use a diminishing step size, or to tune a fixed step size by hand, which can be time consuming in practice. In this paper, we propose to use the Barzilai-Borwein (BB) method to automatically compute step sizes for SGD and its variant: stochastic variance reduced gradient (SVRG) method, which leads to two algorithms: SGD-BB and SVRG-BB. We prove that SVRG-BB converges linearly for strongly convex objective functions. As a by-product, we prove the linear convergence result of SVRG with Option I proposed in [10], whose convergence result is missing in the literature. Numerical experiments on standard data sets show that the performance of SGD-BB and SVRG-BB is comparable to and sometimes even better than SGD and SVRG with best-tuned step sizes, and is superior to some advanced SGD variants.