A statistical model for tensor PCA

Neural Information Processing Systems

We consider the Principal Component Analysis problem for large tensors of arbitrary orderk under a single-spike (or rank-one plus noise) model. On the one hand, we use information theory, and recent results in probability theory, to establish necessaryand sufficient conditions under which the principal component can be estimated using unbounded computational resources. It turns out that this is possible as soon as the signal-to-noise ratio β becomes larger than C k log k (and in particular β can remain bounded as the problem dimensions increase). On the other hand, we analyze several polynomial-time estimation algorithms, based on tensor unfolding, power iteration and message passing ideas from graphical models.We show that, unless the signal-to-noise ratio diverges in the system dimensions, none of these approaches succeeds. This is possibly related to a fundamental limitationof computationally tractable estimators for this problem. We discuss various initializations for tensor power iteration, and show that a tractable initialization based on the spectrum of the unfolded tensor outperforms significantly baseline methods, statistically and computationally. Finally, we consider thecase in which additional side information is available about the unknown signal. We characterize the amount of side information that allows the iterative algorithms to converge to a good estimate.


A statistical model for tensor PCA

arXiv.org Machine Learning

We consider the Principal Component Analysis problem for large tensors of arbitrary order $k$ under a single-spike (or rank-one plus noise) model. On the one hand, we use information theory, and recent results in probability theory, to establish necessary and sufficient conditions under which the principal component can be estimated using unbounded computational resources. It turns out that this is possible as soon as the signal-to-noise ratio $\beta$ becomes larger than $C\sqrt{k\log k}$ (and in particular $\beta$ can remain bounded as the problem dimensions increase). On the other hand, we analyze several polynomial-time estimation algorithms, based on tensor unfolding, power iteration and message passing ideas from graphical models. We show that, unless the signal-to-noise ratio diverges in the system dimensions, none of these approaches succeeds. This is possibly related to a fundamental limitation of computationally tractable estimators for this problem. We discuss various initializations for tensor power iteration, and show that a tractable initialization based on the spectrum of the matricized tensor outperforms significantly baseline methods, statistically and computationally. Finally, we consider the case in which additional side information is available about the unknown signal. We characterize the amount of side information that allows the iterative algorithms to converge to a good estimate.


Phase transition in PCA with missing data: Reduced signal-to-noise ratio, not sample size!

arXiv.org Machine Learning

How does missing data affect our ability to learn signal structures? It has been shown that learning signal structure in terms of principal components is dependent on the ratio of sample size and dimensionality and that a critical number of observations is needed before learning starts (Biehl and Mietzner, 1993). Here we generalize this analysis to include missing data. Probabilistic principal component analysis is regularly used for estimating signal structures in datasets with missing data. Our analytic result suggests that the effect of missing data is to effectively reduce signal-to-noise ratio rather than - as generally believed - to reduce sample size. The theory predicts a phase transition in the learning curves and this is indeed found both in simulation data and in real datasets.


Resolution Limits of Sparse Coding in High Dimensions

Neural Information Processing Systems

Recent research suggests that neural systems employ sparse coding. However, there is limited theoretical understanding of fundamental resolution limits in such sparse coding. This paper considers a general sparse estimation problem of detecting the sparsity pattern of a $k$-sparse vector in $\R n$ from $m$ random noisy measurements. Our main results provide necessary and sufficient conditions on the problem dimensions, $m$, $n$ and $k$, and the signal-to-noise ratio (SNR) for asymptotically-reliable detection. We show a necessary condition for perfect recovery at any given SNR for all algorithms, regardless of complexity, is $m \Omega(k\log(n-k))$ measurements.


Interpolating Convex and Non-Convex Tensor Decompositions via the Subspace Norm

Neural Information Processing Systems

We consider the problem of recovering a low-rank tensor from its noisy observation. Previous work has shown a recovery guarantee with signal to noise ratio $O(n^{\ceil{K/2}/2})$ for recovering a $K$th order rank one tensor of size $n\times \cdots \times n$ by recursive unfolding. In this paper, we first improve this bound to $O(n^{K/4})$ by a much simpler approach, but with a more careful analysis. Then we propose a new norm called the \textit{subspace} norm, which is based on the Kronecker products of factors obtained by the proposed simple estimator. The imposed Kronecker structure allows us to show a nearly ideal $O(\sqrt{n}+\sqrt{H^{K-1}})$ bound, in which the parameter $H$ controls the blend from the non-convex estimator to mode-wise nuclear norm minimization. Furthermore, we empirically demonstrate that the subspace norm achieves the nearly ideal denoising performance even with $H=O(1)$.