Supervised Learning Approach to Approximate Nearest Neighbor Search
Hyvönen, Ville, Jääsaari, Elias, Roos, Teemu
Approximate nearest neighbor search is a classic algorithmic problem where the goal is to design an efficient index structure for fast approximate nearest neighbor queries. We show that it can be framed as a classification problem and solved by training a suitable multi-label classifier and using it as an index. Compared to the existing algorithms, this supervised learning approach has several advantages: it enables adapting an index to the query distribution when the query distribution and the corpus distribution differ; it allows using training sets larger than the corpus; and in principle it enables using any multi-label classifier for approximate nearest neighbor search. We demonstrate these advantages on multiple synthetic and real-world data sets by using a random forest and an ensemble of random projection trees as the base classifiers. Introduction In k -nearest neighbor ( k -nn) search, k points that are nearest to the query point are retrieved from the corpus. Approximate nearest neighbor search is used to speed up k -nn search in applications where fast response times are critical, such as in computer vision, robotics, and recommendation systems. Traditionally, approximate nearest neighbor search is approached as a problem in algorithms and data structures. Space-partitioning methods--trees, hashing, and quantization--divide the space according to a geometric criterion. For instance, k -d trees (Bentley 1975) and principal component trees (McNames 2001) are grown by hierarchically partitioning the space along the maximum variance directions of the corpus.
Oct-18-2019
- Country:
- Asia (0.04)
- North America > United States
- California > Santa Clara County > Palo Alto (0.04)
- Europe
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Finland > Uusimaa
- Helsinki (0.04)
- United Kingdom > England
- Genre:
- Research Report > New Finding (0.46)