Goto

Collaborating Authors

 Case-Based Reasoning


An algorithm for L1 nearest neighbor search via monotonic embedding

Neural Information Processing Systems

Fast algorithms for nearest neighbor (NN) search have in large part focused on L2 distance. Here we develop an approach for L1 distance that begins with an explicit and exact embedding of the points into L2. We show how this embedding can efficiently be combined with random projection methods for L2 NN search, such as locality-sensitive hashing or random projection trees. We rigorously establish the correctness of the methodology and show by experimentation that it is competitive in practice with available alternatives.


Finite-Sample Analysis of Fixed-k Nearest Neighbor Density Functional Estimators

Neural Information Processing Systems

We provide finite-sample analysis of a general framework for using k-nearest neighbor statistics to estimate functionals of a nonparametric continuous probability density, including entropies and divergences. Rather than plugging a consistent density estimate (which requires k as the sample size n) into the functional of interest, the estimators we consider fix k and perform a bias correction. This can be more efficient computationally, and, as we show, statistically, leading to faster convergence rates. Our framework unifies several previous estimators, for most of which ours are the first finite sample guarantees.


Worst-case Performance of Popular Approximate Nearest Neighbor Search Implementations: Guarantees and Limitations

Neural Information Processing Systems

Graph-based approaches to nearest neighbor search are popular and powerful tools for handling large datasets in practice, but they have limited theoretical guarantees. We study the worst-case performance of recent graph-based approximate nearest neighbor search algorithms, such as HNSW, NSG and DiskANN. For DiskANN, we show that its "slow preprocessing" version provably supports approximate nearest neighbor search query with constant approximation ratio and poly-logarithmic query time, on data sets with bounded "intrinsic" dimension. For the other data structure variants studied, including DiskANN with "fast preprocessing", HNSW and NSG, we present a family of instances on which the empirical query time required to achieve a "reasonable" accuracy is linear in instance size. For example, for DiskANN, we show that the query procedure can take at least 0.1n steps on instances of size n before it encounters any of the 5 nearest neighbors of the query.


A Multilabel Classification Framework for Approximate Nearest Neighbor Search

Neural Information Processing Systems

Both supervised and unsupervised machine learning algorithms have been used to learn partition-based index structures for approximate nearest neighbor (ANN) search. Existing supervised algorithms formulate the learning task as finding a partition in which the nearest neighbors of a training set point belong to the same partition element as the point itself, so that the nearest neighbor candidates can be retrieved by naive lookup or backtracking search. We formulate candidate set selection in ANN search directly as a multilabel classification problem where the labels correspond to the nearest neighbors of the query point, and interpret the partitions as partitioning classifiers for solving this task. Empirical results suggest that the natural classifier based on this interpretation leads to strictly improved performance when combined with any unsupervised or supervised partitioning strategy. We also prove a sufficient condition for consistency of a partitioning classifier for ANN search, and illustrate the result by verifying this condition for chronological k-d trees.


Falconn++: A Locality-sensitive Filtering Approach for Approximate Nearest Neighbor Search

Neural Information Processing Systems

Falconn can filter out potential far away points in any hash bucket before querying, which results in higher quality candidates compared to other hashing-based solutions. Theoretically, Falconn asymptotically achieves lower query time complexity than Falconn, an optimal locality-sensitive hashing scheme on angular distance. Empirically, Falconn achieves a higher recall-speed tradeoff than Falconn on many real-world data sets. Falconn is also competitive with HNSW, an efficient representative of graph-based solutions on high search recall regimes.


Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

This paper proposes the Latent Case Model (LCM), a Bayesian approach to clustering in which clusters are represented by a prototype (a specific sample from the data) and feature subspaces (a binary subset of the variables signifying those features that are relevant to the class). The approach is presented as being a Bayesian, trainable version of the Case-Based Reasoning approach popular in AI, and is motivated by the ways such models have proved highly effective in explaining human decision making. The generative model (Figure 1) represents each item as coming from a mixture of S clusters, where each cluster is represented by a prototype and subspace (as above) and a function \phi which generates features matching those of the prototype with high probability for features in the subspace, and uniform features outside it. The model is thus similar in functionality to LDA but quite different in terms of its representation. Experimental results are shown for "prediction accuracy" (more on this later) and unsupervised accuracy of the clusters vs. a baseline of LDA (Figure 2), and the results seem superior/comparable to LDA.


Reviews: Rates of Convergence for Large-scale Nearest Neighbor Classification

Neural Information Processing Systems

The paper presents a simple and clear algorithm of a divide-and-conquer scheme of distributed classification using the nearest neighbour framework. The general methods presented are not entirely new, but are accompanied by a crisp statistical analysis which proves tight convergence rates to the Bayes optimal classifier. The novelty in this algorithm is the complete distributed nature of it - the fact that very little information must be transferred between different computing units as opposed to previous work algorithms which compelled more data transference between them. The claims are clear and understandable, and the theoretical part is very satisfactory, as well as a nice complementary empirical section showing improved speeds (in the denoised case) as well as improved regret. On the other hand, I add a few points to explain the overall score I have given this submission: 1) The algorithm is rather similar to the denoising algorithm referred to in the paper.


Rates of Convergence for Large-scale Nearest Neighbor Classification

Neural Information Processing Systems

Nearest neighbor is a popular class of classification methods with many desirable properties. For a large data set which cannot be loaded into the memory of a single machine due to computation, communication, privacy, or ownership limitations, we consider the divide and conquer scheme: the entire data set is divided into small subsamples, on which nearest neighbor predictions are made, and then a final decision is reached by aggregating the predictions on subsamples by majority voting. We name this method the big Nearest Neighbor (bigNN) classifier, and provide its rates of convergence under minimal assumptions, in terms of both the excess risk and the classification instability, which are proven to be the same rates as the oracle nearest neighbor classifier and cannot be improved. To significantly reduce the prediction time that is required for achieving the optimal rate, we also consider the pre-training acceleration technique applied to the bigNN method, with proven convergence rate. We find that in the distributed setting, the optimal choice of the neighbor k should scale with both the total sample size and the number of partitions, and there is a theoretical upper limit for the latter. Numerical studies have verified the theoretical findings.


Review for NeurIPS paper: On Convergence of Nearest Neighbor Classifiers over Feature Transformations

Neural Information Processing Systems

Summary and Contributions: Update: Thanks for addressing the concerns raised by the reviewers, based on re-reading the paper and going over the comments, I am able to understand the experiments better - and based on the authors comments that they will revise the draft to make things more clear, I will change my score to accept. Having said that, I would still keep my confidence low since I am unable to accurately access the significance of the result and I believe that would be a key factor to consider in a novel theoretical paper. The result is based on two key properties of the transformed space that they identify. The first is'safety', which is a measure of how well can we recover the posterior in the original space from the feature space. The second is smoothness, which is a measure of how hard it is to recover the posterior in the original space from the feature space.


Review for NeurIPS paper: On Convergence of Nearest Neighbor Classifiers over Feature Transformations

Neural Information Processing Systems

This paper provides some interesting theoretical insights into the convergence of kNN over feature transformations. This is backed up by some empirical results. All three reviewers argue for acceptance, but have also provided some directions for improvement, which was acknowledged by the authors in their feedback, promising to include these changes in the final version. Personally I have one issue with the paper, which is introducing some datasets in the experimental section, without providing any results. These are supplied in the additional material, to me that feels like cheating.