Supervised Learning Approach to Approximate Nearest Neighbor Search

Hyvönen, Ville, Jääsaari, Elias, Roos, Teemu

Oct-18-2019–arXiv.org Machine Learning

Approximate nearest neighbor search is a classic algorithmic problem where the goal is to design an efficient index structure for fast approximate nearest neighbor queries. We show that it can be framed as a classification problem and solved by training a suitable multi-label classifier and using it as an index. Compared to the existing algorithms, this supervised learning approach has several advantages: it enables adapting an index to the query distribution when the query distribution and the corpus distribution differ; it allows using training sets larger than the corpus; and in principle it enables using any multi-label classifier for approximate nearest neighbor search. We demonstrate these advantages on multiple synthetic and real-world data sets by using a random forest and an ensemble of random projection trees as the base classifiers. Introduction In k -nearest neighbor ( k -nn) search, k points that are nearest to the query point are retrieved from the corpus. Approximate nearest neighbor search is used to speed up k -nn search in applications where fast response times are critical, such as in computer vision, robotics, and recommendation systems. Traditionally, approximate nearest neighbor search is approached as a problem in algorithms and data structures. Space-partitioning methods--trees, hashing, and quantization--divide the space according to a geometric criterion. For instance, k -d trees (Bentley 1975) and principal component trees (McNames 2001) are grown by hierarchically partitioning the space along the maximum variance directions of the corpus.

artificial intelligence, natural language, nearest neighbor search, (20 more...)

arXiv.org Machine Learning

Oct-18-2019

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Statistical Learning
    - Nearest Neighbor Methods (0.50)
  - Natural Language > Information Retrieval (1.00)
  - Representation & Reasoning > Case-Based Reasoning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found