Learning Sublinear-Time Indexing for Nearest Neighbor Search

Most of the efficient sublinear-time indexing algorithms for the high-dimensional nearest neighbor search problem (NNS) are based on space partitions of the ambient space $\mathbb{R}^d$. Inspired by recent theoretical work on NNS for general metric spaces [Andoni, Naor, Nikolov, Razenshteyn, Waingarten STOC 2018, FOCS 2018], we develop a new framework for constructing such partitions that reduces the problem to balanced graph partitioning followed by supervised classification. We instantiate this general approach with the KaHIP graph partitioner [Sanders, Schulz SEA 2013] and neural networks, respectively, to obtain a new partitioning procedure called Neural Locality-Sensitive Hashing (Neural LSH). On several standard benchmarks for NNS, our experiments show that the partitions found by Neural LSH consistently outperform partitions found by quantization- and tree-based methods.

Approximate Nearest Neighbor Search in High Dimensions

The nearest neighbor problem is defined as follows: Given a set $P$ of $n$ points in some metric space $(X,D)$, build a data structure that, given any point $q$, returns a point in $P$ that is closest to $q$ (its "nearest neighbor" in $P$). The data structure stores additional information about the set $P$, which is then used to find the nearest neighbor without computing all distances between $q$ and $P$. The problem has a wide range of applications in machine learning, computer vision, databases and other fields. To reduce the time needed to find nearest neighbors and the amount of memory used by the data structure, one can formulate the {\em approximate} nearest neighbor problem, where the the goal is to return any point $p' \in P$ such that the distance from $q$ to $p'$ is at most $c \cdot \min_{p \in P} D(q,p)$, for some $c \geq 1$. Over the last two decades, many efficient solutions to this problem were developed. In this article we survey these developments, as well as their connections to questions in geometric functional analysis and combinatorial geometry.

Towards Similarity Graphs Constructed by Deep Reinforcement Learning

Similarity graphs are an active research direction for the nearest neighbor search (NNS) problem. New algorithms for similarity graph construction are continuously being proposed and analyzed by both theoreticians and practitioners. However, existing construction algorithms are mostly based on heuristics and do not explicitly maximize the target performance measure, i.e., search recall. Therefore, at the moment it is not clear whether the performance of similarity graphs has plateaued or more effective graphs can be constructed with more theoretically grounded methods. In this paper, we introduce a new principled algorithm, based on adjacency matrix optimization, which explicitly maximizes search efficiency. Namely, we propose a probabilistic model of a similarity graph defined in terms of its edge probabilities and show how to learn these probabilities from data as a reinforcement learning task. As confirmed by experiments, the proposed construction method can be used to refine the state-of-the-art similarity graphs, achieving higher recall rates for the same number of distance computations. Furthermore, we analyze the learned graphs and reveal the structural properties that are responsible for more efficient search.

Practical and Optimal LSH for Angular Distance

We show the existence of a Locality-Sensitive Hashing (LSH) family for the angular distancethat yields an approximate Near Neighbor Search algorithm with the asymptotically optimal running time exponent. Unlike earlier algorithms with this property (e.g., Spherical LSH [1, 2]), our algorithm is also practical, improving upon the well-studied hyperplane LSH [3] in practice. We also introduce a multiprobe versionof this algorithm and conduct an experimental evaluation on real and synthetic data sets. We complement the above positive results with a fine-grained lower bound for the quality of any LSH family for angular distance. Our lower bound implies that the above LSH family exhibits a tradeoff between evaluation time and quality that is close to optimal for a natural class of LSH functions.

Fast Approximate Nearest-Neighbor Search with k-Nearest Neighbor Graph

We introduce a new nearest neighbor search al-gorithm. The algorithm builds a nearest neighborgraph in an offline phase and when queried witha new point, performs hill-climbing starting froma randomly sampled node of the graph. We pro-vide theoretical guarantees for the accuracy and thecomputational complexity and empirically showthe effectiveness of this algorithm.