Goto

Collaborating Authors

 error bound condition


Adaptive SVRG Methods under Error Bound Conditions with Unknown Growth Parameter

Neural Information Processing Systems

Error bound, an inherent property of an optimization problem, has recently revived in the development of algorithms with improved global convergence without strong convexity. The most studied error bound is the quadratic error bound, which generalizes strong convexity and is satisfied by a large family of machine learning problems. Quadratic error bound have been leveraged to achieve linear convergence in many first-order methods including the stochastic variance reduced gradient (SVRG) method, which is one of the most important stochastic optimization methods in machine learning. However, the studies along this direction face the critical issue that the algorithms must depend on an unknown growth parameter (a generalization of strong convexity modulus) in the error bound. This parameter is difficult to estimate exactly and the algorithms choosing this parameter heuristically do not have theoretical convergence guarantee. To address this issue, we propose novel SVRG methods that automatically search for this unknown parameter on the fly of optimization while still obtain almost the same convergence rate as when this parameter is known. We also analyze the convergence property of SVRG methods under H\{o}lderian error bound, which generalizes the quadratic error bound.


Fast Rates of ERM and Stochastic Approximation: Adaptive to Error Bound Conditions

Neural Information Processing Systems

Error bound conditions (EBC) are properties that characterize the growth of an objective function when a point is moved away from the optimal set. They have recently received increasing attention in the field of optimization for developing optimization algorithms with fast convergence. However, the studies of EBC in statistical learning are hitherto still limited. The main contributions of this paper are two-fold. First, we develop fast and intermediate rates of empirical risk minimization (ERM) under EBC for risk minimization with Lipschitz continuous, and smooth convex random functions. Second, we establish fast and intermediate rates of an efficient stochastic approximation (SA) algorithm for risk minimization with Lipschitz continuous random functions, which requires only one pass of $n$ samples and adapts to EBC. For both approaches, the convergence rates span a full spectrum between $\widetilde O(1/\sqrt{n})$ and $\widetilde O(1/n)$ depending on the power constant in EBC, and could be even faster than $O(1/n)$ in special cases for ERM. Moreover, these convergence rates are automatically adaptive without using any knowledge of EBC. Overall, this work not only strengthens the understanding of ERM for statistical learning but also brings new fast stochastic algorithms for solving a broad range of statistical learning problems.


Reviews: Fast Rates of ERM and Stochastic Approximation: Adaptive to Error Bound Conditions

Neural Information Processing Systems

Overview: This paper develops fast excess risk bounds for ERM and develops stochastic optimization algorithms for losses that (at the population level) satisfy the so-called "error bound condition". The error bound condition generalizes a particular inequality implied by strong convexity to 1) non-convex but still curved losses 2) convex losses that are not strongly convex but enjoy similar (possibly slower) growth around the optimum. The authors give 1) An ERM guarantee that interpolates between \sqrt{d/n} and d/n as a function of the exponent for which the EBC holds, and does not assume convexity and 2) A faster/optimistic rate that depends on the loss of the benchmark, but requires convexity in addition to EBC 3) A stochastic optimization routine matching the (not optimistic) guarantee 1). Originality and Significance: My takeaway is that the results in this paper do not require huge changes in proof technique compared to previous results that assume strong convexity/convexity (eg van Erven et al. "Fast rates in statistical and online learning" for Theorem 1 and Zhang et al., "Empirical risk minimization for stochastic convex optimization: O(1/n)- and o(1/n 2)-type of risk bounds" for Theorem 2), but I do like that the paper is fairly comprehensive and I think it will probably serve as a useful reference point for future research. Some further coments: * Theorem 2 is fairly restrictive since it has to assume convexity, but I see from the examples that there are indeed losses that satisfy EBC for theta 0 yet are not strongly convex.


Fast Rates of ERM and Stochastic Approximation: Adaptive to Error Bound Conditions

Liu, Mingrui, Zhang, Xiaoxuan, Zhang, Lijun, Jin, Rong, Yang, Tianbao

Neural Information Processing Systems

Error bound conditions (EBC) are properties that characterize the growth of an objective function when a point is moved away from the optimal set. They have recently received increasing attention in the field of optimization for developing optimization algorithms with fast convergence. However, the studies of EBC in statistical learning are hitherto still limited. The main contributions of this paper are two-fold. First, we develop fast and intermediate rates of empirical risk minimization (ERM) under EBC for risk minimization with Lipschitz continuous, and smooth convex random functions.


Adaptive SVRG Methods under Error Bound Conditions with Unknown Growth Parameter

Xu, Yi, Lin, Qihang, Yang, Tianbao

Neural Information Processing Systems

Error bound, an inherent property of an optimization problem, has recently revived in the development of algorithms with improved global convergence without strong convexity. The most studied error bound is the quadratic error bound, which generalizes strong convexity and is satisfied by a large family of machine learning problems. Quadratic error bound have been leveraged to achieve linear convergence in many first-order methods including the stochastic variance reduced gradient (SVRG) method, which is one of the most important stochastic optimization methods in machine learning. However, the studies along this direction face the critical issue that the algorithms must depend on an unknown growth parameter (a generalization of strong convexity modulus) in the error bound. This parameter is difficult to estimate exactly and the algorithms choosing this parameter heuristically do not have theoretical convergence guarantee. To address this issue, we propose novel SVRG methods that automatically search for this unknown parameter on the fly of optimization while still obtain almost the same convergence rate as when this parameter is known.