Estimating the effective dimension of large biological datasets using Fisher separability analysis

Albergante, Luca, Bac, Jonathan, Zinovyev, Andrei

Jan-18-2019–arXiv.org Machine Learning

Moreover, it is frequently assumed that the nature of this variety is a manifold, and that the data point cloud represents an i.i.d. In practice, the ID of the manifold is assumed to be not only much smaller than the number of variables defining the data space but also to be small in absolute number. Thus, any practically useful nonlinear data manifold should not have more than three or four intrinsic degrees of freedom. Theoretically, the manifold concept does not have to be universal in the case of real-life datasets. Abstract--Modern large-scale datasets are frequently said to be high-dimensional. However, their data point clouds frequently possess structures, significantly decreasing their intrinsic dimensionality (ID)due to the presence of clusters, points being located close to low-dimensional varieties or fine-grained lumping. We test a recently introduced dimensionality estimator, based on analysing the separability properties of data points, on several benchmarks and real biological datasets.

dataset, dimension, dimensionality, (15 more...)

arXiv.org Machine Learning

Jan-18-2019

arXiv.org PDF

Add feedback

Country:
- Asia > Russia (0.04)
- North America > United States
  - New York (0.04)
- Europe
  - Russia (0.04)
  - United Kingdom > England
    - Oxfordshire > Oxford (0.04)
  - Netherlands > North Holland
    - Amsterdam (0.04)
  - France > Île-de-France
    - Paris > Paris (0.04)

Genre:
- Research Report (0.40)

Industry:
- Health & Medicine
  - Pharmaceuticals & Biotechnology (0.93)
  - Therapeutic Area > Oncology (0.46)

Technology:
- Information Technology
  - Data Science > Data Mining (0.68)
  - Artificial Intelligence
    - Machine Learning > Statistical Learning (1.00)
    - Representation & Reasoning (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found