Goto

Collaborating Authors

 Loukas, Andreas


Approximating Spectral Clustering via Sampling: a Review

arXiv.org Machine Learning

Spectral clustering refers to a family of unsupervised learning algorithms that compute a spectral embedding of the original data based on the eigenvectors of a similarity graph. This non-linear transformation of the data is both the key of these algorithms' success and their Achilles heel: forming a graph and computing its dominant eigenvectors can indeed be computationally prohibitive when dealing with more that a few tens of thousands of points. In this paper, we review the principal research efforts aiming to reduce this computational cost. We focus on methods that come with a theoretical control on the clustering performance and incorporate some form of sampling in their operation. Such methods abound in the machine learning, numerical linear algebra, and graph signal processing literature and, amongst others, include Nystr\"om-approximation, landmarks, coarsening, coresets, and compressive spectral clustering. We present the approximation guarantees available for each and discuss practical merits and limitations. Surprisingly, despite the breadth of the literature explored, we conclude that there is still a gap between theory and practice: the most scalable methods are only intuitively motivated or loosely controlled, whereas those that come with end-to-end guarantees rely on strong assumptions or enable a limited gain of computation time.


Graph reduction by local variation

arXiv.org Machine Learning

How can we reduce the size of a graph without significantly altering its basic properties? We approach the graph reduction problem from the perspective of restricted similarity, a modification of a well-known measure for graph approximation. Our choice is motivated by the observation that restricted similarity implies strong spectral guarantees and can be used to prove statements about certain unsupervised learning problems. The paper then focuses on coarsening, a popular type of graph reduction. We derive sufficient conditions for a small graph to approximate a larger one in the sense of restricted similarity. Our theoretical findings give rise to a novel quasi-linear algorithm. Compared to both standard and advanced graph reduction methods, the proposed algorithm finds coarse graphs of improved quality -often by a large margin- without sacrificing speed.


Spectrally approximating large graphs with smaller graphs

arXiv.org Machine Learning

How does coarsening affect the spectrum of a general graph? We provide conditions such that the principal eigenvalues and eigenspaces of a coarsened and original graph Laplacian matrices are close. The achieved approximation is shown to depend on standard graph-theoretic properties, such as the degree and eigenvalue distributions, as well as on the ratio between the coarsened and actual graph sizes. Our results carry implications for learning methods that utilize coarsening. For the particular case of spectral clustering, they imply that coarse eigenvectors can be used to derive good quality assignments even without refinement---this phenomenon was previously observed, but lacked formal justification.


Fast Approximate Spectral Clustering for Dynamic Networks

arXiv.org Machine Learning

Spectral clustering is a widely studied problem, yet its complexity is prohibitive for dynamic graphs of even modest size. We claim that it is possible to reuse information of past cluster assignments to expedite computation. Our approach builds on a recent idea of sidestepping the main bottleneck of spectral clustering, i.e., computing the graph eigenvectors, by using fast Chebyshev graph filtering of random signals. We show that the proposed algorithm achieves clustering assignments with quality approximating that of spectral clustering and that it can yield significant complexity benefits when the graph dynamics are appropriately bounded.


Stationary time-vertex signal processing

arXiv.org Machine Learning

The goal of this paper is to improve learning for multivariate processes whose structure is dependent on some known graph topology; especially when the number of available samples is much smaller than the number of variables. Typically, the graph information is incorporated into the learning process via a smoothness assumption postulating that the values supported on well-connected vertices exhibit small variations. We argue that smoothness is not enough. To capture the behavior of complex interconnected systems, such as transportation and biological networks, it is important to train expressive models, being able to reproduce a wide range of graph and temporal behaviors. Motivated by this need, this paper puts forth a novel definition of time-vertex wide-sense stationarity, or joint stationarity for short. We believe that the proposed definition is natural, at it intimately relates to existing definitions of stationarity in the time and vertex domains. We use joint stationarity to regularize learning and to reduce computational complexity in both estimation and recovery tasks. In particular, we show that for any jointly stationary process: (a) one can learn the covariance structure from O(1) samples, and (b) can solve MMSE recovery problems, such as interpolation, denoising, forecasting, in complexity that is linear to the edges and timesteps. Experiments with three datasets suggest that joint stationarity can yield significant accuracy improvements in the reconstruction effort of under-sampled problems, even when the graph is only approximately known or the process is only close to stationary.


How close are the eigenvectors and eigenvalues of the sample and actual covariance matrices?

arXiv.org Machine Learning

How many samples are sufficient to guarantee that the eigenvectors and eigenvalues of the sample covariance matrix are close to those of the actual covariance matrix? For a wide family of distributions, including distributions with finite second moment and distributions supported in a centered Euclidean ball, we prove that the inner product between eigenvectors of the sample and actual covariance matrices decreases proportionally to the respective eigenvalue distance. Our findings imply non-asymptotic concentration bounds for eigenvectors, eigenspaces, and eigenvalues. They also provide conditions for distinguishing principal components based on a constant number of samples.


Autoregressive Moving Average Graph Filtering

arXiv.org Machine Learning

One of the cornerstones of the field of signal processing on graphs are graph filters, direct analogues of classical filters, but intended for signals defined on graphs. This work brings forth new insights on the distributed graph filtering problem. We design a family of autoregressive moving average (ARMA) recursions, which (i) are able to approximate any desired graph frequency response, and (ii) give exact solutions for tasks such as graph signal denoising and interpolation. The design philosophy, which allows us to design the ARMA coefficients independently from the underlying graph, renders the ARMA graph filters suitable in static and, particularly, time-varying settings. The latter occur when the graph signal and/or graph are changing over time. We show that in case of a time-varying graph signal our approach extends naturally to a two-dimensional filter, operating concurrently in the graph and regular time domains. We also derive sufficient conditions for filter stability when the graph and signal are time-varying. The analytical and numerical results presented in this paper illustrate that ARMA graph filters are practically appealing for static and time-varying settings, as predicted by theoretical derivations.


Predicting the evolution of stationary graph signals

arXiv.org Machine Learning

An emerging way of tackling the dimensionality issues arising in the modeling of a multivariate process is to assume that the inherent data structure can be captured by a graph. Nevertheless, though state-of-the-art graph-based methods have been successful for many learning tasks, they do not consider time-evolving signals and thus are not suitable for prediction. Based on the recently introduced joint stationarity framework for time-vertex processes, this letter considers multivariate models that exploit the graph topology so as to facilitate the prediction. The resulting method yields similar accuracy to the joint (time-graph) mean-squared error estimator but at lower complexity, and outperforms purely time-based methods.


Towards stationary time-vertex signal processing

arXiv.org Machine Learning

Graph-based methods for signal processing have shown promise for the analysis of data exhibiting irregular structure, such as those found in social, transportation, and sensor networks. Yet, though these systems are often dynamic, state-of-the-art methods for signal processing on graphs ignore the dimension of time, treating successive graph signals independently or taking a global average. To address this shortcoming, this paper considers the statistical analysis of time-varying graph signals. We introduce a novel definition of joint (time-vertex) stationarity, which generalizes the classical definition of time stationarity and the more recent definition appropriate for graphs. Joint stationarity gives rise to a scalable Wiener optimization framework for joint denoising, semi-supervised learning, or more generally inversing a linear operator, that is provably optimal. Experimental results on real weather data demonstrate that taking into account graph and time dimensions jointly can yield significant accuracy improvements in the reconstruction effort.


Frequency Analysis of Temporal Graph Signals

arXiv.org Machine Learning

This letter extends the concept of graph-frequency to graph signals that evolve with time. Our goal is to generalize and, in fact, unify the familiar concepts from time- and graph-frequency analysis. To this end, we study a joint temporal and graph Fourier transform (JFT) and demonstrate its attractive properties. We build on our results to create filters which act on the joint (temporal and graph) frequency domain, and show how these can be used to perform interference cancellation. The proposed algorithms are distributed, have linear complexity, and can approximate any desired joint filtering objective.