Fast, Linear Time, m-Adic Hierarchical Clustering for Search and Retrieval using the Baire Metric, with linkages to Generalized Ultrametrics, Hashing, Formal Concept Analysis, and Precision of Data Measurement
Murtagh, Fionn, Contreras, Pedro
In areas such as search, matching, retrieval and general data analysis, massive increase in data requires new methods that can cope well with the explosion in volume and dimensionality of the available data. In this work, the Baire metric, which is furthermore an ultrametric, is used to induce a hierarchy and in turn to support clustering, matching and other operations. Arising directly out of the Baire distance is an ultrametric tree, which also can be seen as a tree that hierarchically clusters data. This presents a number of advantages when storing and retrieving data. When the data source is in numerical form this ultrametric tree can be used as an index structure making matching and search, and thus retrieval, much easier. The clusters can be associated with hash keys, that is to say, the cluster members can be mapped onto "bins" or "buckets".
Nov-27-2011
- Country:
- Europe (0.68)
- North America > United States
- California > San Francisco County > San Francisco (0.14)
- Genre:
- Research Report (0.64)
- Technology: