Goto

Collaborating Authors

 stieltje transform


Free Decompression with Algebraic Spectral Curves

arXiv.org Machine Learning

At the core of scientific computing and much of modern machine learning (ML) lies the challenge of estimating the eigenvalues of high-dimensional Hermitian matrices. Such matrices, including kernels, Hessians, and graph representations, encode the intrinsic geometry and connectivity of the data and models built on them, rendering the pursuit of efficient spectral techniques a primary concern for both theory and practice. Studying eigenspectra has become a prominent approach to understanding performance and guiding training in deep learning [10, 20, 36, 53]. In many cases, the spectra of such matrices have non-trivial structure, often containing spikes, multiple multi-modal bulks, and heavy-tails [14, 25]. Conventional algorithms to extract eigenvalue information from these matrices have required that the data are able to be stored in memory, scratch space, or can at least be accessed as an implicit operator (via matrix-vector products). More recently, a new class of algorithms has emerged that is able to provide highly-accurate estimates of the eigenvalues (or summary functionals thereof [2]) of matrices, even without implicit or explicit access to the full matrix, i.e., of so-called impalpable matrices [1]. One such method, termed Free Decompression (FD), shows great promise as a tool for gaining access to the spectral distributions of such impalpable matrices. The central premise is that by appropriately sampling a small sub-matrix from the large impalpable matrix of interest, one can evolve a partial differential equation (PDE) in the Stieltjes transform of a spectral density in the decompression ratio to the desired matrix dimension.






High-Dimensional Partial Least Squares: Spectral Analysis and Fundamental Limitations

arXiv.org Machine Learning

Partial Least Squares (PLS) is a widely used method for data integration, designed to extract latent components shared across paired high-dimensional datasets. Despite decades of practical success, a precise theoretical understanding of its behavior in high-dimensional regimes remains limited. In this paper, we study a data integration model in which two high-dimensional data matrices share a low-rank common latent structure while also containing individual-specific components. We analyze the singular vectors of the associated cross-covariance matrix using tools from random matrix theory and derive asymptotic characterizations of the alignment between estimated and true latent directions. These results provide a quantitative explanation of the reconstruction performance of the PLS variant based on Singular Value Decomposition (PLS-SVD) and identify regimes where the method exhibits counter-intuitive or limiting behavior. Building on this analysis, we compare PLS-SVD with principal component analysis applied separately to each dataset and show its asymptotic superiority in detecting the common latent subspace. Overall, our results offer a comprehensive theoretical understanding of high-dimensional PLS-SVD, clarifying both its advantages and fundamental limitations.


Nonlinear random matrix theory for deep learning

Neural Information Processing Systems

The list of successful applications of deep learning is growing at a staggering rate. Image recognition (Krizhevsky et al., 2012), audio synthesis (Oord et al., 2016), translation (Wu et al., 2016), and speech recognition (Hinton et al., 2012) are just a few of the recent achievements.


Asymptotic behavior of eigenvalues of large rank perturbations of large random matrices

arXiv.org Artificial Intelligence

Random Matrix Theory (RMT) is a classical theory that has been developing for more than 70 years. Initially, RMT arose from problems in nuclear physics and found its applications in mathematics, physics, finance, and many other disciplines. Recently, new problems have been arising from the area of Machine Learning. Indeed, often the weight matrices of Deep Neural Networks (DNNs) are initialized randomly. Moreover, modern DNNs have large weight matrices, which is why their spectral properties can be described by asymptotic behavior of N N random matrices as N goes to infinity.



Random Matrix Theory for Deep Learning: Beyond Eigenvalues of Linear Models

arXiv.org Machine Learning

Modern Machine Learning (ML) and Deep Neural Networks (DNNs) often operate on high-dimensional data and rely on overparameterized models, where classical low-dimensional intuitions break down. In particular, the proportional regime where the data dimension, sample size, and number of model parameters are all large and comparable, gives rise to novel and sometimes counterintuitive behaviors. This paper extends traditional Random Matrix Theory (RMT) beyond eigenvalue-based analysis of linear models to address the challenges posed by nonlinear ML models such as DNNs in this regime. We introduce the concept of High-dimensional Equivalent, which unifies and generalizes both Deterministic Equivalent and Linear Equivalent, to systematically address three technical challenges: high dimensionality, nonlinearity, and the need to analyze generic eigenspectral functionals. Leveraging this framework, we provide precise characterizations of the training and generalization performance of linear models, nonlinear shallow networks, and deep networks. Our results capture rich phenomena, including scaling laws, double descent, and nonlinear learning dynamics, offering a unified perspective on the theoretical understanding of deep learning in high dimensions.