A Topological Approach to Spectral Clustering
The analysis of complex, high-dimensional data is one of the major research challenges in contemporary computer science and statistics. In recent years, geometric and topological approaches to data analysis have begun to yield important insights into the structure of complex data sets (see, for instance, [1] for an example of spectral geometry applied to dimension reduction, and [6], [2] for surveys on homological methods of data analysis and visualization). The common point of departure of these methods is the assumption that data in highdimensional spaces is often concentrated around a low-dimensional manifold or other topological space. In this note, we begin from the assumption that the data comes from a uniform distribution supported on a topologically disconnected space, and that clusters in the data reflect this lack of topological connectivity. Geometric techniques for data analysis have concentrated on approximating the geometry of the data as a step toward nonlinear dimension reduction. Once the dimension is reduced, standard statistical techniques are then used to analyze the data in the lower-dimensional space.
Jun-8-2015