Intrinsic Dimension of Geometric Data Sets
Hanika, Tom, Schneider, Friedrich Martin, Stumme, Gerd
–arXiv.org Artificial Intelligence
The curse of dimensionality is a phenomenon frequently observed in machine learning (ML) and knowledge discovery (KD). There is a large body of literature investigating its origin and impact, using methods from mathematics as well as from computer science. Among the mathematical insights into data dimensionality, there is an intimate link between the dimension curse and the phenomenon of measure concentration, which makes the former accessible to methods of geometric analysis. The present work provides a comprehensive study of the intrinsic geometry of a data set, based on Gromov's metric measure geometry and Pestov's axiomatic approach to intrinsic dimension. In detail, we define a concept of geometric data set and introduce a metric as well as a partial order on the set of isomorphism classes of such data sets. Based on these objects, we propose and investigate an axiomatic approach to the intrinsic dimension of geometric data sets and establish a concrete dimension function with the desired properties. Our mathematical model for data sets and their intrinsic dimension is computationally feasible and, moreover, adaptable to specific ML/KD-algorithms, as illustrated by various experiments.
arXiv.org Artificial Intelligence
Dec-24-2018
- Country:
- Asia > Japan
- Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)
- Europe
- France > Île-de-France
- Germany
- Bavaria > Upper Bavaria
- Munich (0.04)
- Saxony > Dresden (0.04)
- Bavaria > Upper Bavaria
- Poland > Masovia Province
- Warsaw (0.04)
- Switzerland > Basel-City
- Basel (0.04)
- North America
- Canada > Ontario
- Toronto (0.04)
- United States
- Florida > Orange County
- Orlando (0.04)
- Massachusetts > Suffolk County
- Boston (0.04)
- New York (0.04)
- Rhode Island > Providence County
- Providence (0.04)
- Florida > Orange County
- Canada > Ontario
- South America > Brazil
- Rio de Janeiro > Rio de Janeiro (0.04)
- Asia > Japan
- Genre:
- Instructional Material > Course Syllabus & Notes (0.46)
- Research Report (0.64)
- Industry:
- Health & Medicine > Therapeutic Area > Oncology (0.92)
- Technology: