The Sharp Phase Transition of Tyler's M-Estimator for Robust Subspace Recovery

Lerman, Gilad, Zhang, Teng

arXiv.org Machine Learning 

Robust Subspace Recovery (RSR) aims to identify an underlying d-dimensional subspace from a dataset heavily corrupted by outliers. Complexity-theoretic results establish a threshold for the problem's computational hardness based on the dimensionscaled signal-to-noise ratio (DS-SNR): the problem is SSE-hard when the DS-SNR is strictly less than 1, and solvable via practical algorithms when it is greater than 1 under general position assumptions. However, the exact behavior of practical algorithms at the critical boundary DS-SNR = 1 has remained unknown. Specifically, we prove that TME converges exactly to the true subspace for DS-SNR 1 under a new stability condition, which is less restrictive than the general position assumptions used in prior literature. I. Introduction Robust Subspace Recovery (RSR) is a fundamental problem in robust statistics, machine learning, and computer vision. The primary goal of RSR is to identify an underlying low-dimensional linear subspace from a dataset that is heavily corrupted by outliers. The standard formulation of the noiseless RSR problem assumes a dataset X = {xi}Ni=1 RD consisting of n1 inliers lying exactly on a d-dimensional linear subspace L RD, and n0 outliers lying strictly off L . We refer to such a dataset as a noiseless inlier-outlier dataset, where the total number of points is N = n0 +n1. The central algorithmic question in noiseless RSR is under what conditions one can exactly and efficiently recover the underlying d-subspace L . A natural metric for characterizing the difficulty of this problem is the ratio of inliers to outliers, n1/n0, which can be viewed as a signal-to-noise ratio (SNR) [8], [11], [12]. This leads to the dimension-scaled SNR (DS-SNR), denoted by δS: δS:= n1/d n0/(D d) . Hardt and Moitra [5] established a fundamental lower bound, showing that when δS < 1, the noiseless RSR problem is Small Set Expansion (SSE)-hard, a property conjectured to be equivalent to NP-hardness [15]. In the special case of hyperplanes (d = D 1), they showed NP-hardness by invoking a result from [7]. The noiseless RSR problem is SSE-hard if δS < 1.