Not enough data to create a plot.
Try a different view from the menu above.
Lebeau, Hugo
Performance of Rank-One Tensor Approximation on Incomplete Data
Lebeau, Hugo
We are interested in the estimation of a rank-one tensor signal when only a portion $\varepsilon$ of its noisy observation is available. We show that the study of this problem can be reduced to that of a random matrix model whose spectral analysis gives access to the reconstruction performance. These results shed light on and specify the loss of performance induced by an artificial reduction of the memory cost of a tensor via the deletion of a random part of its entries.
Asymptotic Gaussian Fluctuations of Eigenvectors in Spectral Clustering
Lebeau, Hugo, Chatelain, Florent, Couillet, Romain
The performance of spectral clustering relies on the fluctuations of the entries of the eigenvectors of a similarity matrix, which has been left uncharacterized until now. In this letter, it is shown that the signal $+$ noise structure of a general spike random matrix model is transferred to the eigenvectors of the corresponding Gram kernel matrix and the fluctuations of their entries are Gaussian in the large-dimensional regime. This CLT-like result was the last missing piece to precisely predict the classification performance of spectral clustering. The proposed proof is very general and relies solely on the rotational invariance of the noise. Numerical experiments on synthetic and real data illustrate the universality of this phenomenon.
Performance Gaps in Multi-view Clustering under the Nested Matrix-Tensor Model
Lebeau, Hugo, Seddik, Mohamed El Amine, Goulart, José Henrique de Morais
We study the estimation of a planted signal hidden in a recently introduced nested matrix-tensor model, which is an extension of the classical spiked rank-one tensor model, motivated by multi-view clustering. Prior work has theoretically examined the performance of a tensor-based approach, which relies on finding a best rank-one approximation, a problem known to be computationally hard. A tractable alternative approach consists in computing instead the best rank-one (matrix) approximation of an unfolding of the observed tensor data, but its performance was hitherto unknown. We quantify here the performance gap between these two approaches, in particular by deriving the precise algorithmic threshold of the unfolding approach and demonstrating that it exhibits a BBP-type transition behavior (Baik et al., 2005). This work is therefore in line with recent contributions which deepen our understanding of why tensor-based methods surpass matrix-based methods in handling structured tensor data. In the age of artificial intelligence, handling vast amounts of data has become a fundamental aspect of machine learning tasks. Datasets are often high-dimensional and composed of multiple modes, such as various modalities, sensors, sources, types, or domains, naturally lending themselves to be represented as tensors. Tensors offer a richer structure compared to traditional one-dimensional vectors and two-dimensional matrices, making them increasingly relevant in various applications, including statistical learning and data analysis (Landsberg, 2012; Sun et al., 2014). Y et, in the existing literature, there is a notable scarcity of theoretical studies that specifically address the performance gaps between tensor-based methods and traditional (matrix) spectral methods in the context of high-dimensional data analysis. While tensor methods have shown promise in various applications, including multi-view clustering, co-clustering, community detection, and latent variable modeling (Wu et al., 2019; Anandkumar et al., 2014; Papalexakis et al., 2012; Wang et al., 2023), little attention has been devoted to rigorously quantifying the advantages and drawbacks of leveraging the hidden low-rank tensor structure.
A Random Matrix Approach to Low-Multilinear-Rank Tensor Approximation
Lebeau, Hugo, Chatelain, Florent, Couillet, Romain
This work presents a comprehensive understanding of the estimation of a planted low-rank signal from a general spiked tensor model near the computational threshold. Relying on standard tools from the theory of large random matrices, we characterize the large-dimensional spectral behavior of the unfoldings of the data tensor and exhibit relevant signal-to-noise ratios governing the detectability of the principal directions of the signal. These results allow to accurately predict the reconstruction performance of truncated multilinear SVD (MLSVD) in the non-trivial regime. This is particularly important since it serves as an initialization of the higher-order orthogonal iteration (HOOI) scheme, whose convergence to the best low-multilinear-rank approximation depends entirely on its initialization. We give a sufficient condition for the convergence of HOOI and show that the number of iterations before convergence tends to $1$ in the large-dimensional limit.