Subspace Determination through Local Intrinsic Dimensional Decomposition: Theory and Experimentation
Becker, Ruben, Hafnaoui, Imane, Houle, Michael E., Li, Pan, Zimek, Arthur
In data mining, machine learning, and other areas of AI, we are often faced with datasets that contain many more attributes than needed, or that can even be helpful for tasks such as clustering or classification. Problems associated with such high dimensional data are for example the concentration effect of distances [13, 20] or irrelevant features [25, 49]. For clustering [31] and outlier detection [49], researchers have made use of various techniques to identify relevant subspaces, as defined by subsets of features that are informative for a particular task. Examples of how relevant subspaces can be determined for individual clusters or outliers include local density estimation in a systematic search through candidate subspaces (often following the Apriori principle [7] in various adaptations to the subspace search problem [48]), or the adaptation of distance measures based on the distribution within local neighborhoods (using some analysis of variance or even covariance -- typically based on PCA -- to allow also for an adaptation to correlated features). For sufficiently tight local neighborhoods, the underlying local data manifold can be regarded as approaching a linear form [40], an assumption that further justifies the determination of locally relevant features for subspace determination.
Jul-15-2019
- Country:
- North America > Canada
- Europe
- Denmark > Southern Denmark (0.04)
- Italy > Abruzzo
- L'Aquila Province > L'Aquila (0.04)
- Germany > Saarland
- Saarbrücken (0.04)
- Asia > Japan
- Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Genre:
- Research Report (1.00)
- Industry:
- Materials > Metals & Mining (0.34)
- Technology: