Collaborating Authors

Linear Convergence of SVRG in Statistical Estimation Machine Learning

SVRG and its variants are among the state of art optimization algorithms for large scale machine learning problems. It is well known that SVRG converges linearly when the objective function is strongly convex. However this setup can be restrictive, and does not include several important formulations such as Lasso, group Lasso, logistic regression, and some non-convex models including corrected Lasso and SCAD. In this paper, we prove that, for a class of statistical M-estimators covering examples mentioned above, SVRG solves the formulation with {\em a linear convergence rate} without strong convexity or even convexity. Our analysis makes use of {\em restricted strong convexity}, under which we show that SVRG converges linearly to the fundamental statistical precision of the model, i.e., the difference between true unknown parameter $\theta^*$ and the optimal solution $\hat{\theta}$ of the model.

Minimum Distance Estimation for Robust High-Dimensional Regression Machine Learning

We propose a minimum distance estimation method for robust regression in sparse high-dimensional settings. The traditional likelihood-based estimators lack resilience against outliers, a critical issue when dealing with high-dimensional noisy data. Our method, Minimum Distance Lasso (MD-Lasso), combines minimum distance functionals, customarily used in nonparametric estimation for their robustness, with l1-regularization for high-dimensional regression. The geometry of MD-Lasso is key to its consistency and robustness. The estimator is governed by a scaling parameter that caps the influence of outliers: the loss per observation is locally convex and close to quadratic for small squared residuals, and flattens for squared residuals larger than the scaling parameter. As the parameter approaches infinity, the estimator becomes equivalent to least-squares Lasso. MD-Lasso enjoys fast convergence rates under mild conditions on the model error distribution, which hold for any of the solutions in a convexity region around the true parameter and in certain cases for every solution. Remarkably, a first-order optimization method is able to produce iterates very close to the consistent solutions, with geometric convergence and regardless of the initialization. A connection is established with re-weighted least-squares that intuitively explains MD-Lasso robustness. The merits of our method are demonstrated through simulation and eQTL data analysis.

Beyond Convexity: Online Submodular Minimization

Neural Information Processing Systems

We consider an online decision problem over a discrete space in which the loss function is submodular. We give algorithms which are computationally efficient and are Hannan-consistent in both the full information and bandit settings. Papers published at the Neural Information Processing Systems Conference.

Enumerate Lasso Solutions for Feature Selection

AAAI Conferences

We propose an algorithm for enumerating solutions to the Lasso regression problem.In ordinary Lasso regression, one global optimum is obtained and the resulting features are interpreted as task-relevant features.However, this can overlook possibly relevant features not selected by the Lasso.With the proposed method, we can enumerate many possible feature sets for human inspection, thus recording all the important features.We prove that by enumerating solutions, we can recover a true feature set exactly under less restrictive conditions compared with the ordinary Lasso.We confirm our theoretical results also in numerical simulations.Finally, in the gene expression and the text data, we demonstrate that the proposed method can enumerate a wide variety of meaningful feature sets, which are overlooked by the global optima.

Improved Dynamic Regret for Non-degenerate Functions

Neural Information Processing Systems

Recently, there has been a growing research interest in the analysis of dynamic regret, which measures the performance of an online learner against a sequence of local minimizers. By exploiting the strong convexity, previous studies have shown that the dynamic regret can be upper bounded by the path-length of the comparator sequence. In this paper, we illustrate that the dynamic regret can be further improved by allowing the learner to query the gradient of the function multiple times, and meanwhile the strong convexity can be weakened to other non-degenerate conditions. Specifically, we introduce the squared path-length, which could be much smaller than the path-length, as a new regularity of the comparator sequence. When multiple gradients are accessible to the learner, we first demonstrate that the dynamic regret of strongly convex functions can be upper bounded by the minimum of the path-length and the squared path-length.