Tsakiris, Manolis C.
Shuffled Multi-Channel Sparse Signal Recovery
Koka, Taulant, Tsakiris, Manolis C., Muma, Michael, Haro, Benjamín Béjar
Mismatches between samples and their respective channel or target commonly arise in several real-world applications. For instance, whole-brain calcium imaging of freely moving organisms, multiple-target tracking or multi-person contactless vital sign monitoring may be severely affected by mismatched sample-channel assignments. To systematically address this fundamental problem, we pose it as a signal reconstruction problem where we have lost correspondences between the samples and their respective channels. Assuming that we have a sensing matrix for the underlying signals, we show that the problem is equivalent to a structured unlabeled sensing problem, and establish sufficient conditions for unique recovery. To the best of our knowledge, a sampling result for the reconstruction of shuffled multi-channel signals has not been considered in the literature and existing methods for unlabeled sensing cannot be directly applied. We extend our results to the case where the signals admit a sparse representation in an overcomplete dictionary (i.e., the sensing matrix is not precisely known), and derive sufficient conditions for the reconstruction of shuffled sparse signals. We propose a robust reconstruction method that combines sparse signal recovery with robust linear regression for the two-channel case. The performance and robustness of the proposed approach is illustrated in an application related to whole-brain calcium imaging. The proposed methodology can be generalized to sparse signal representations other than the ones considered in this work to be applied in a variety of real-world problems with imprecise measurement or channel assignment.
Results on the algebraic matroid of the determinantal variety
Tsakiris, Manolis C.
We make progress towards characterizing the algebraic matroid of the determinantal variety defined by the minors of fixed size of a matrix of variables. Our main result is a novel family of base sets of the matroid, which characterizes the matroid in special cases. Our approach relies on the combinatorial notion of relaxed supports of linkage matching fields that we introduce, our interpretation of the problem of completing a matrix of bounded rank from a subset of its entries as a linear section problem on the Grassmannian, and a connection that we draw with a class of local coordinates on the Grassmannian described by Sturmfels and Zelevinsky in 1993.
Unlabeled Principal Component Analysis
Yao, Yunzhen, Peng, Liangzu, Tsakiris, Manolis C.
We consider the problem of principal component analysis from a data matrix where the entries of each column have undergone some unknown permutation, termed Unlabeled Principal Component Analysis (UPCA). Using algebraic geometry, we establish that for generic enough data, and up to a permutation of the coordinates of the ambient space, there is a unique subspace of minimal dimension that explains the data. We show that a permutation-invariant system of polynomial equations has finitely many solutions, with each solution corresponding to a row permutation of the ground-truth data matrix. Allowing for missing entries on top of permutations leads to the problem of unlabeled matrix completion, for which we give theoretical results of similar flavor. We also propose a two-stage algorithmic pipeline for UPCA suitable for the practically relevant case where only a fraction of the data has been permuted. Stage-I of this pipeline employs robust-PCA methods to estimate the ground-truth column-space. Equipped with the column-space, stage-II applies methods for linear regression without correspondences to restore the permuted data. A computational study reveals encouraging findings, including the ability of UPCA to handle face images from the Extended Yale-B database with arbitrarily permuted patches of arbitrary size in $0.3$ seconds on a standard desktop computer.
An exposition to the finiteness of fibers in matrix completion via Plucker coordinates
Tsakiris, Manolis C.
Low-rank matrix completion is a popular paradigm in machine learning, but little is known about the completion properties of non-random observation patterns. A fundamental open question in this direction is the following: given an observation pattern of a sufficiently generic (e.g. incoherent) $m \times n$ real matrix $X$ of rank $r$ with exactly $r(m+n-r)$ entries being observed, this number being the dimension of the space of real rank-$r$ $m \times n$ matrices, are there finitely many rank-$r$ completions? This is a challenging problem whose answer is known only for ranks $1$, $2$ and $\min\{m,n\}-1$. In this paper we study this problem by bringing tools from algebraic geometry. In particular, we exploit the fact that both the space of real rank-$r$ $m \times n$ matrices as well as the set of $r$-dimensional subspaces of $\mathbb{R}^m$, known as the Grassmannian, are algebraic varieties. Our approach is based on a novel formulation of matrix completion in terms of Pl{\"u}cker coordinates, the latter a traditionally powerful tool in computer vision and graphics and a classical notion in algebraic geometry. This formulation allows us to characterize a large class of minimal (i.e. of size $r(m+n-r)$) observation patterns for which a generic matrix admits finitely many rank-r completions. We conjecture that the converse is also true: any minimal pattern which is generically finitely completable must be of that type. As a consequence, we generalize results that have previously appeared and are being used in the literature, but lack a sufficient theoretical justification. We believe the Pl{\"u}cker-coordinate based link that we establish between low-rank matrices and the Grassmannian in the context of matrix completion to be of wider significance for matrix and subspace learning problems with incomplete data.
Homomorphic Sensing of Subspace Arrangements
Peng, Liangzu, Wang, Boshi, Tsakiris, Manolis C.
Homomorphic sensing is a recent algebraic-geometric framework that studies the unique recovery of points in a linear subspace from their images under a given collection of linear transformations. It has been successful in interpreting such a recovery in the case of permutations composed by coordinate projections, an important instance in applications known as unlabeled sensing, which models data that are out of order and have missing values. In this paper we make several fundamental contributions. First, we extend the homomorphic sensing framework from a single subspace to a subspace arrangement. Second, when specialized to a single subspace the new conditions are simpler and tighter. Third, as a natural consequence of our main theorem we obtain in a unified way recovery conditions for real phase retrieval, typically known via diverse techniques in the literature, as well as novel conditions for sparse and unsigned versions of linear regression without correspondences and unlabeled sensing. Finally, we prove that the homomorphic sensing property is locally stable to noise.
An Algebraic-Geometric Approach to Shuffled Linear Regression
Tsakiris, Manolis C., Peng, Liangzu, Conca, Aldo, Kneip, Laurent, Shi, Yuanming, Choi, Hayoung
Shuffled linear regression is the problem of performing a linear regression fit to a dataset for which the correspondences between the independent samples and the observations are unknown. Such a problem arises in diverse domains such as computer vision, communications and biology. In its simplest form, it is tantamount to solving a linear system of equations, for which the entries of the right hand side vector have been permuted. This type of data corruption renders the linear regression task considerably harder, even in the absence of other corruptions, such as noise, outliers or missing entries. Existing methods are either applicable only to noiseless data or they are very sensitive to initialization and work only for partially shuffled data. In this paper we address both of these issues via an algebraic geometric approach, which uses symmetric polynomials to extract permutation-invariant constraints that the parameters $\boldsymbol{x} \in \mathbb{R}^n$ of the linear regression model must satisfy. This naturally leads to a polynomial system of $n$ equations in $n$ unknowns, which contains $\boldsymbol{x}$ in its root locus. Using the machinery of algebraic geometry we prove that as long as the independent samples are generic, this polynomial system is always consistent with at most $n!$ complex roots, regardless of any type of corruption inflicted on the observations. The algorithmic implication of this fact is that one can always solve this polynomial system and use its most suitable root as initialization to the Expectation Maximization algorithm. To the best of our knowledge, the resulting method is the first working solution for small values of $n$ able to handle thousands of fully shuffled noisy observations in milliseconds.
Theoretical Analysis of Sparse Subspace Clustering with Missing Entries
Tsakiris, Manolis C., Vidal, Rene
Sparse Subspace Clustering (SSC) is a popular unsupervised machine learning method for clustering data lying close to an unknown union of low-dimensional linear subspaces; a problem with numerous applications in pattern recognition and computer vision. Even though the behavior of SSC for complete data is by now well-understood, little is known about its theoretical properties when applied to data with missing entries. In this paper we give theoretical guarantees for SSC with incomplete data, and analytically establish that projecting the zero-filled data onto the observation pattern of the point being expressed leads to a substantial improvement in performance. The main insight that stems from our analysis is that even though the projection induces additional missing entries, this is counterbalanced by the fact that the projected and zero-filled data are in effect incomplete points associated with the union of the corresponding projected subspaces, with respect to which the point being expressed is complete. The significance of this phenomenon potentially extends to the entire class of self-expressive methods.
Hyperplane Clustering Via Dual Principal Component Pursuit
Tsakiris, Manolis C., Vidal, Rene
We extend the theoretical analysis of a recently proposed single subspace learning algorithm, called Dual Principal Component Pursuit (DPCP), to the case where the data are drawn from of a union of hyperplanes. To gain insight into the properties of the $\ell_1$ non-convex problem associated with DPCP, we develop a geometric analysis of a closely related continuous optimization problem. Then transferring this analysis to the discrete problem, our results state that as long as the hyperplanes are sufficiently separated, the dominant hyperplane is sufficiently dominant and the points are uniformly distributed inside the associated hyperplanes, then the non-convex DPCP problem has a unique global solution, equal to the normal vector of the dominant hyperplane. This suggests the correctness of a sequential hyperplane learning algorithm based on DPCP. A thorough experimental evaluation reveals that hyperplane learning schemes based on DPCP dramatically improve over the state-of-the-art methods for the case of synthetic data, while are competitive to the state-of-the-art in the case of 3D plane clustering for Kinect data.