Goto

Collaborating Authors

 lasso solution


Exclusive Feature Learning on Arbitrary Structures via $\ell_{1,2}$-norm

Deguang Kong, Ryohei Fujimaki, Ji Liu, Feiping Nie, Chris Ding

Neural Information Processing Systems

Group LASSO is widely used to enforce the structural sparsity, which achieves the sparsity at the inter-group level. In this paper, we propose a new formulation called "exclusive group LASSO", which brings out sparsity at intra-group level in the context of feature selection. The proposed exclusive group LASSO is applicable on any feature structures, regardless of their overlapping or non-overlapping structures. We provide analysis on the properties of exclusive group LASSO, and propose an effective iteratively re-weighted algorithm to solve the corresponding optimization problem with rigorous convergence analysis. We show applications of exclusive group LASSO for uncorrelated feature selection. Extensive experiments on both synthetic and real-world datasets validate the proposed method.


Adaptive Iterative Soft-Thresholding Algorithm with the Median Absolute Deviation

Feng, Yining, Selesnick, Ivan

arXiv.org Machine Learning

Abstract--The adaptive Iterative Soft-Thresholding Algorithm (IST A) has been a popular algorithm for finding a desirable solution to the LASSO problem without explicitly tuning the regularization parameter λ. Despite that the adaptive IST A is a successful practical algorithm, few theoretical results exist. In this paper, we present the theoretical analysis on the adaptive IST A with the thresh-olding strategy of estimating noise level by median absolut e deviation. We show properties of the fixed points of the algorithm, including scale equivariance, non-uniqueness, and local stability, prove the local linear convergence guarantee, and show its global convergence behavior . Many sparse approximation problems in machine learning and signal processing can be obtained as the solution to the LASSO problem, which can be solved by IST A. Despite its popularity, tuning The obtained LASSO solution is optimal in the mean-squared-error (MSE) sense with minimum assumptions, but LARS is not competitive in terms of computation time for large-scale problems [7].


Exclusive Feature Learning on Arbitrary Structures via $\ell_{1,2}$-norm

Deguang Kong, Ryohei Fujimaki, Ji Liu, Feiping Nie, Chris Ding

Neural Information Processing Systems

Group LASSO is widely used to enforce the structural sparsity, which achieves the sparsity at the inter-group level. In this paper, we propose a new formulation called "exclusive group LASSO", which brings out sparsity at intra-group level in the context of feature selection. The proposed exclusive group LASSO is applicable on any feature structures, regardless of their overlapping or non-overlapping structures. We provide analysis on the properties of exclusive group LASSO, and propose an effective iteratively re-weighted algorithm to solve the corresponding optimization problem with rigorous convergence analysis. We show applications of exclusive group LASSO for uncorrelated feature selection. Extensive experiments on both synthetic and real-world datasets validate the proposed method.


Lecture Notes on High Dimensional Linear Regression

Quaini, Alberto

arXiv.org Machine Learning

These lecture notes were developed for a Master's course in advanced machine learning at Erasmus University of Rotterdam. The course is designed for graduate students in mathematics, statistics and econometrics. The content follows a proposition-proof structure, making it suitable for students seeking a formal and rigorous understanding of the statistical theory underlying machine learning methods. At present, the notes focus on linear regression, with an in-depth exploration of the existence, uniqueness, relations, computation, and nonasymptotic properties of the most prominent estimators in this setting: least squares, ridgeless, ridge, and lasso. Background It is assumed that readers have a solid background in calculus, linear algebra, convex analysis, and probability theory.


Solution-Set Geometry and Regularization Path of a Nonconvexly Regularized Convex Sparse Model

Zhang, Yi, Yamada, Isao

arXiv.org Artificial Intelligence

The generalized minimax concave (GMC) penalty is a nonconvex sparse regularizer which can preserve the overall-convexity of the regularized least-squares problem. In this paper, we focus on a significant instance of the GMC model termed scaled GMC (sGMC), and present various notable findings on its solution-set geometry and regularization path. Our investigation indicates that while the sGMC penalty is a nonconvex extension of the LASSO penalty (i.e., the $\ell_1$-norm), the sGMC model preserves many celebrated properties of the LASSO model, hence can serve as a less biased surrogate of LASSO without losing its advantages. Specifically, for a fixed regularization parameter $\lambda$, we show that the solution-set geometry, solution uniqueness and sparseness of the sGMC model can be characterized in a similar elegant way to the LASSO model (see, e.g., Osborne et al. 2000, R. J. Tibshirani 2013). For a varying $\lambda$, we prove that the sGMC solution set is a continuous polytope-valued mapping of $\lambda$. Most noticeably, our study indicates that similar to LASSO, the minimum $\ell_2$-norm regularization path of the sGMC model is continuous and piecewise linear in $\lambda$. Based on these theoretical results, an efficient regularization path algorithm is proposed for the sGMC model, extending the well-known least angle regression (LARS) algorithm for LASSO. We prove the correctness and finite termination of the proposed algorithm under a mild assumption, and confirm its correctness-in-general-situation, efficiency, and practical utility through numerical experiments. Many results in this study also contribute to the theoretical research of LASSO.


Exclusive Feature Learning on Arbitrary Structures via l

Neural Information Processing Systems

Group LASSO is widely used to enforce the structural sparsity, which achieves the sparsity at the inter-group level. In this paper, we propose a new formulation called "exclusive group LASSO", which brings out sparsity at intra-group level in the context of feature selection. The proposed exclusive group LASSO is applicable on any feature structures, regardless of their overlapping or non-overlapping structures. We provide analysis on the properties of exclusive group LASSO, and propose an effective iteratively re-weighted algorithm to solve the corresponding optimization problem with rigorous convergence analysis. We show applications of exclusive group LASSO for uncorrelated feature selection. Extensive experiments on both synthetic and real-world datasets validate the proposed method.


A Library of Mirrors: Deep Neural Nets in Low Dimensions are Convex Lasso Models with Reflection Features

Zeger, Emi, Wang, Yifei, Mishkin, Aaron, Ergen, Tolga, Candès, Emmanuel, Pilanci, Mert

arXiv.org Machine Learning

We prove that training neural networks on 1-D data is equivalent to solving a convex Lasso problem with a fixed, explicitly defined dictionary matrix of features. The specific dictionary depends on the activation and depth. We consider 2-layer networks with piecewise linear activations, deep narrow ReLU networks with up to 4 layers, and rectangular and tree networks with sign activation and arbitrary depth. Interestingly in ReLU networks, a fourth layer creates features that represent reflections of training data about themselves.


Inconsistency of cross-validation for structure learning in Gaussian graphical models

Lyu, Zhao, Tai, Wai Ming, Kolar, Mladen, Aragam, Bryon

arXiv.org Machine Learning

Despite numerous years of research into the merits and trade-offs of various model selection criteria, obtaining robust results that elucidate the behavior of cross-validation remains a challenging endeavor. In this paper, we highlight the inherent limitations of cross-validation when employed to discern the structure of a Gaussian graphical model. We provide finite-sample bounds on the probability that the Lasso estimator for the neighborhood of a node within a Gaussian graphical model, optimized using a prediction oracle, misidentifies the neighborhood. Our results pertain to both undirected and directed acyclic graphs, encompassing general, sparse covariance structures. To support our theoretical findings, we conduct an empirical investigation of this inconsistency by contrasting our outcomes with other commonly used information criteria through an extensive simulation study. Given that many algorithms designed to learn the structure of graphical models require hyperparameter selection, the precise calibration of this hyperparameter is paramount for accurately estimating the inherent structure. Consequently, our observations shed light on this widely recognized practical challenge.


Quantum Algorithms for the Pathwise Lasso

Doriguello, João F., Lim, Debbie, Pun, Chi Seng, Rebentrost, Patrick, Vaidya, Tushar

arXiv.org Machine Learning

We present a novel quantum high-dimensional linear regression algorithm with an $\ell_1$-penalty based on the classical LARS (Least Angle Regression) pathwise algorithm. Similarly to available classical numerical algorithms for Lasso, our quantum algorithm provides the full regularisation path as the penalty term varies, but quadratically faster per iteration under specific conditions. A quadratic speedup on the number of features/predictors $d$ is possible by using the simple quantum minimum-finding subroutine from D\"urr and Hoyer (arXiv'96) in order to obtain the joining time at each iteration. We then improve upon this simple quantum algorithm and obtain a quadratic speedup both in the number of features $d$ and the number of observations $n$ by using the recent approximate quantum minimum-finding subroutine from Chen and de Wolf (ICALP'23). As one of our main contributions, we construct a quantum unitary based on quantum amplitude estimation to approximately compute the joining times to be searched over by the approximate quantum minimum finding. Since the joining times are no longer exactly computed, it is no longer clear that the resulting approximate quantum algorithm obtains a good solution. As our second main contribution, we prove, via an approximate version of the KKT conditions and a duality gap, that the LARS algorithm (and therefore our quantum algorithm) is robust to errors. This means that it still outputs a path that minimises the Lasso cost function up to a small error if the joining times are only approximately computed. Finally, in the model where the observations are generated by an underlying linear model with an unknown coefficient vector, we prove bounds on the difference between the unknown coefficient vector and the approximate Lasso solution, which generalises known results about convergence rates in classical statistical learning theory analysis.


Research Papers based on Lasso Regression part2(Machine Learning)

#artificialintelligence

Abstract: The application of the lasso is espoused in high-dimensional settings where only a small number of the regression coefficients are believed to be nonzero. Moreover, statistical properties of high-dimensional lasso estimators are often proved under the assumption that the correlation between the predictors is bounded. In this vein, coordinatewise methods, the most common means of computing the lasso solution, work well in the presence of low to moderate multicollinearity. Motivated by these limitations, we propose the novel "Deterministic Bayesian Lasso" algorithm for computing the lasso solution. This algorithm is developed by considering a limiting version of the Bayesian lasso.