Subspace Determination through Local Intrinsic Dimensional Decomposition: Theory and Experimentation

Becker, Ruben, Hafnaoui, Imane, Houle, Michael E., Li, Pan, Zimek, Arthur

Jul-15-2019–arXiv.org Machine Learning

In data mining, machine learning, and other areas of AI, we are often faced with datasets that contain many more attributes than needed, or that can even be helpful for tasks such as clustering or classification. Problems associated with such high dimensional data are for example the concentration effect of distances [13, 20] or irrelevant features [25, 49]. For clustering [31] and outlier detection [49], researchers have made use of various techniques to identify relevant subspaces, as defined by subsets of features that are informative for a particular task. Examples of how relevant subspaces can be determined for individual clusters or outliers include local density estimation in a systematic search through candidate subspaces (often following the Apriori principle [7] in various adaptations to the subspace search problem [48]), or the adaptation of distance measures based on the distribution within local neighborhoods (using some analysis of variance or even covariance -- typically based on PCA -- to allow also for an adaptation to correlated features). For sufficiently tight local neighborhoods, the underlying local data manifold can be regarded as approaching a linear form [40], an assumption that further justifies the determination of locally relevant features for subspace determination.

estimator, neighborhood, subspace, (17 more...)

arXiv.org Machine Learning

Jul-15-2019

arXiv.org PDF

Add feedback

Country:
- North America > Canada
  - Quebec > Montreal (0.04)
- Europe
  - Denmark > Southern Denmark (0.04)
  - Italy > Abruzzo
    - L'Aquila Province > L'Aquila (0.04)
  - Germany > Saarland
    - Saarbrücken (0.04)
- Asia > Japan
  - Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)

Genre:
- Research Report (1.00)

Industry:
- Materials > Metals & Mining (0.34)

Technology:
- Information Technology
  - Data Science > Data Mining (1.00)
  - Artificial Intelligence
    - Representation & Reasoning (1.00)
    - Machine Learning > Statistical Learning
      - Clustering (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found