Estimating the effective dimension of large biological datasets using Fisher separability analysis

Albergante, Luca, Bac, Jonathan, Zinovyev, Andrei

arXiv.org Machine Learning 

Moreover, it is frequently assumed that the nature of this variety is a manifold, and that the data point cloud represents an i.i.d. In practice, the ID of the manifold is assumed to be not only much smaller than the number of variables defining the data space but also to be small in absolute number. Thus, any practically useful nonlinear data manifold should not have more than three or four intrinsic degrees of freedom. Theoretically, the manifold concept does not have to be universal in the case of real-life datasets. Abstract--Modern large-scale datasets are frequently said to be high-dimensional. However, their data point clouds frequently possess structures, significantly decreasing their intrinsic dimensionality (ID)due to the presence of clusters, points being located close to low-dimensional varieties or fine-grained lumping. We test a recently introduced dimensionality estimator, based on analysing the separability properties of data points, on several benchmarks and real biological datasets.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found