square solution
A Universal Analysis of Large-Scale Regularized Least Squares Solutions
A problem that has been of recent interest in statistical inference, machine learning and signal processing is that of understanding the asymptotic behavior of regularized least squares solutions under random measurement matrices (or dictionaries). The Least Absolute Shrinkage and Selection Operator (LASSO or least-squares with $\ell_1$ regularization) is perhaps one of the most interesting examples. Precise expressions for the asymptotic performance of LASSO have been obtained for a number of different cases, in particular when the elements of the dictionary matrix are sampled independently from a Gaussian distribution. It has also been empirically observed that the resulting expressions remain valid when the entries of the dictionary matrix are independently sampled from certain non-Gaussian distributions. In this paper, we confirm these observations theoretically when the distribution is sub-Gaussian. We further generalize the previous expressions for a broader family of regularization functions and under milder conditions on the underlying random, possibly non-Gaussian, dictionary matrix. In particular, we establish the universality of the asymptotic statistics (e.g., the average quadratic risk) of LASSO with non-Gaussian dictionaries.
Where Have All the Kaczmarz Iterates Gone?
Bergou, El Houcine, Boucherouite, Soumia, Dutta, Aritra, Li, Xin, Ma, Anna
The randomized Kaczmarz (RK) algorithm is one of the most computationally and memory-efficient iterative algorithms for solving large-scale linear systems. However, practical applications often involve noisy and potentially inconsistent systems. While the convergence of RK is well understood for consistent systems, the study of RK on noisy, inconsistent linear systems is limited. This paper investigates the asymptotic behavior of RK iterates in expectation when solving noisy and inconsistent systems, addressing the locations of their limit points. We explore the roles of singular vectors of the (noisy) coefficient matrix and derive bounds on the convergence horizon, which depend on the noise levels and system characteristics. Finally, we provide extensive numerical experiments that validate our theoretical findings, offering practical insights into the algorithm's performance under realistic conditions. These results establish a deeper understanding of the RK algorithm's limitations and robustness in noisy environments, paving the way for optimized applications in real-world scientific and engineering problems.
- Africa > Middle East > Morocco (0.04)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- North America > United States > New York (0.04)
- (2 more...)
PAC Generalization via Invariant Representations
Parulekar, Advait, Shanmugam, Karthikeyan, Shakkottai, Sanjay
One method for obtaining generalizable solutions to machine learning tasks when presented with diverse training environments is to find \textit{invariant representations} of the data. These are representations of the covariates such that the best model on top of the representation is invariant across training environments. In the context of linear Structural Equation Models (SEMs), invariant representations might allow us to learn models with out-of-distribution guarantees, i.e., models that are robust to interventions in the SEM. To address the invariant representation problem in a {\em finite sample} setting, we consider the notion of $\epsilon$-approximate invariance. We study the following question: If a representation is approximately invariant with respect to a given number of training interventions, will it continue to be approximately invariant on a larger collection of unseen SEMs? This larger collection of SEMs is generated through a parameterized family of interventions. Inspired by PAC learning, we obtain finite-sample out-of-distribution generalization guarantees for approximate invariance that holds \textit{probabilistically} over a family of linear SEMs without faithfulness assumptions. Our results show bounds that do not scale in ambient dimension when intervention sites are restricted to lie in a constant size subset of in-degree bounded nodes. We also show how to extend our results to a linear indirect observation model that incorporates latent variables.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- Asia > India (0.04)
Clustering, multicollinearity, and singular vectors
Let $A$ be a matrix with its pseudo-matrix $A^{\dagger}$ and set $S=I-A^{\dagger}A$. We prove that, after re-ordering the columns of $A$, the matrix $S$ has a block-diagonal form where each block corresponds to a set of linearly dependent columns. This allows us to identify redundant columns in $A$. We explore some applications in supervised and unsupervised learning, specially feature selection, clustering, and sensitivity of solutions of least squares solutions.
- North America > United States > California > Santa Clara County > Sunnyvale (0.04)
- North America > Canada > Quebec > Capitale-Nationale Region > Québec (0.04)
- North America > Canada > Quebec > Capitale-Nationale Region > Quebec City (0.04)
- (2 more...)
A Universal Analysis of Large-Scale Regularized Least Squares Solutions
Panahi, Ashkan, Hassibi, Babak
A problem that has been of recent interest in statistical inference, machine learning and signal processing is that of understanding the asymptotic behavior of regularized least squares solutions under random measurement matrices (or dictionaries). The Least Absolute Shrinkage and Selection Operator (LASSO or least-squares with $\ell_1$ regularization) is perhaps one of the most interesting examples. Precise expressions for the asymptotic performance of LASSO have been obtained for a number of different cases, in particular when the elements of the dictionary matrix are sampled independently from a Gaussian distribution. It has also been empirically observed that the resulting expressions remain valid when the entries of the dictionary matrix are independently sampled from certain non-Gaussian distributions. In this paper, we confirm these observations theoretically when the distribution is sub-Gaussian. We further generalize the previous expressions for a broader family of regularization functions and under milder conditions on the underlying random, possibly non-Gaussian, dictionary matrix.
On improving learning capability of ELM and an application to brain-computer interface
Yayık, Apdullah, Kutlu, Yakup, Altan, Gökhan
As a type of pseudoinverse learning, extreme learning machine (ELM) is able to achieve high performances in a rapid pace on benchmark datasets. However, when it is applied to real life large data, decline related to low-convergence of singular value decomposition (SVD) method occurs. Our study aims to resolve this issue via replacing SVD with theoretically and empirically much efficient 5 number of methods: lower upper triangularization, Hessenberg decomposition, Schur decomposition, modified Gram Schmidt algorithm and Householder reflection. Comparisons were made on electroencephalography based brain-computer interface classification problem to decide which method is the most useful. Results of subject-based classifications suggested that if priority was given to training pace, Hessenberg decomposition method, whereas if priority was given to performances Householder reflection method should be preferred.
- Europe > Finland > Uusimaa > Helsinki (0.05)
- North America > United States > New Jersey (0.04)
- Asia > Middle East > Republic of Türkiye > Hatay Province > Iskenderun (0.04)
- Asia > Middle East > Republic of Türkiye > Ankara Province > Ankara (0.04)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
Lass-0: sparse non-convex regression by local search
Herlands, William, De-Arteaga, Maria, Neill, Daniel, Dubrawski, Artur
We compute approximate solutions to L0 regularized linear regression using L1 regularization, also known as the Lasso, as an initialization step. Our algorithm, the Lass-0 ("Lass-zero"), uses a computationally efficient stepwise search to determine a locally optimal L0 solution given any L1 regularization solution. We present theoretical results of consistency under orthogonality and appropriate handling of redundant features. Empirically, we use synthetic data to demonstrate that Lass-0 solutions are closer to the true sparse support than L1 regularization models. Additionally, in real-world data Lass-0 finds more parsimonious solutions than L1 regularization while maintaining similar predictive accuracy.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)
A New Perspective on Boosting in Linear Regression via Subgradient Optimization and Relatives
Freund, Robert M., Grigas, Paul, Mazumder, Rahul
In this paper we analyze boosting algorithms in linear regression from a new perspective: that of modern first-order methods in convex optimization. We show that classic boosting algorithms in linear regression, namely the incremental forward stagewise algorithm (FS$_\varepsilon$) and least squares boosting (LS-Boost($\varepsilon$)), can be viewed as subgradient descent to minimize the loss function defined as the maximum absolute correlation between the features and residuals. We also propose a modification of FS$_\varepsilon$ that yields an algorithm for the Lasso, and that may be easily extended to an algorithm that computes the Lasso path for different values of the regularization parameter. Furthermore, we show that these new algorithms for the Lasso may also be interpreted as the same master algorithm (subgradient descent), applied to a regularized version of the maximum absolute correlation loss function. We derive novel, comprehensive computational guarantees for several boosting algorithms in linear regression (including LS-Boost($\varepsilon$) and FS$_\varepsilon$) by using techniques of modern first-order methods in convex optimization. Our computational guarantees inform us about the statistical properties of boosting algorithms. In particular they provide, for the first time, a precise theoretical description of the amount of data-fidelity and regularization imparted by running a boosting algorithm with a prespecified learning rate for a fixed but arbitrary number of iterations, for any dataset.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- South America > Chile (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (4 more...)
Adaptive Template Matching with Shift-Invariant Semi-NMF
Roux, Jonathan L., Cheveigné, Alain D., Parra, Lucas C.
How does one extract unknown but stereotypical events that are linearly superimposed within a signal with variable latencies and variable amplitudes? One could think of using template matching or matching pursuit to find the arbitrarily shifted linear components. However, traditional matching approaches require that the templates be known a priori. To overcome this restriction we use instead semi Non-Negative Matrix Factorization (semi-NMF) that we extend to allow for time shifts when matching the templates to the signal. The algorithm estimates templates directly from the data along with their non-negative amplitudes. The resulting method can be thought of as an adaptive template matching procedure. We demonstrate the procedure on the task of extracting spikes from single channel extracellular recordings. On these data the algorithm essentially performs spike detection and unsupervised spike clustering. Results on simulated data and extracellular recordings indicate that the method performs well for signal-to-noise ratios of 6dB or higher and that spike templates are recovered accurately provided they are sufficiently different.
- North America > United States > New York (0.04)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Europe > Germany (0.04)
- Europe > Denmark (0.04)
Space and camera path reconstruction for omni-directional vision
Knill, Oliver, Ramirez-Herran, Jose
In this paper, we address the inverse problem of reconstructing a scene as well as the camera motion from the image sequence taken by an omni-directional camera. Our structure from motion results give sharp conditions under which the reconstruction is unique. For example, if there are three points in general position and three omni-directional cameras in general position, a unique reconstruction is possible up to a similarity. We then look at the reconstruction problem with m cameras and n points, where n and m can be large and the over-determined system is solved by least square methods. The reconstruction is robust and generalizes to the case of a dynamic environment where landmarks can move during the movie capture. Possible applications of the result are computer assisted scene reconstruction, 3D scanning, autonomous robot navigation, medical tomography and city reconstructions.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > New York (0.04)
- North America > United States > Texas (0.04)
- (4 more...)
- Media > Film (1.00)
- Leisure & Entertainment (1.00)