Goto

Collaborating Authors

 Case-Based Reasoning


Rates of Convergence for Nearest Neighbor Classification

Neural Information Processing Systems

We analyze the behavior of nearest neighbor classification in metric spaces and provide finite-sample, distribution-dependent rates of convergence under minimal assumptions. These are more general than existing bounds, and enable us, as a by-product, to establish the universal consistency of nearest neighbor in a broader range of data spaces than was previously known. We illustrate our upper and lower bounds by introducing a new smoothness class customized for nearest neighbor classification. We find, for instance, that under the Tsybakov margin condition the convergence rate of nearest neighbor matches recently established lower bounds for nonparametric classification.


The Bayesian Case Model: A Generative Approach for Case-Based Reasoning and Prototype Classification

Neural Information Processing Systems

We present the Bayesian Case Model (BCM), a general framework for Bayesian case-based reasoning (CBR) and prototype classification and clustering. BCM brings the intuitive power of CBR to a Bayesian generative framework. The BCM learns prototypes, the "quintessential" observations that best represent clusters in a dataset, by performing joint inference on cluster labels, prototypes and important features. Simultaneously, BCM pursues sparsity by learning subspaces, the sets of features that play important roles in the characterization of the prototypes. The prototype and subspace representation provides quantitative benefits in interpretability while preserving classification accuracy. Human subject experiments verify statistically significant improvements to participants' understanding when using explanations produced by BCM, compared to those given by prior art.


Reviews: Rates of Convergence for Large-scale Nearest Neighbor Classification

Neural Information Processing Systems

The paper presents a simple and clear algorithm of a divide-and-conquer scheme of distributed classification using the nearest neighbour framework. The general methods presented are not entirely new, but are accompanied by a crisp statistical analysis which proves tight convergence rates to the Bayes optimal classifier. The novelty in this algorithm is the complete distributed nature of it - the fact that very little information must be transferred between different computing units as opposed to previous work algorithms which compelled more data transference between them. The claims are clear and understandable, and the theoretical part is very satisfactory, as well as a nice complementary empirical section showing improved speeds (in the denoised case) as well as improved regret. On the other hand, I add a few points to explain the overall score I have given this submission: 1) The algorithm is rather similar to the denoising algorithm referred to in the paper.


Review for NeurIPS paper: On Convergence of Nearest Neighbor Classifiers over Feature Transformations

Neural Information Processing Systems

Summary and Contributions: Update: Thanks for addressing the concerns raised by the reviewers, based on re-reading the paper and going over the comments, I am able to understand the experiments better - and based on the authors comments that they will revise the draft to make things more clear, I will change my score to accept. Having said that, I would still keep my confidence low since I am unable to accurately access the significance of the result and I believe that would be a key factor to consider in a novel theoretical paper. The result is based on two key properties of the transformed space that they identify. The first is'safety', which is a measure of how well can we recover the posterior in the original space from the feature space. The second is smoothness, which is a measure of how hard it is to recover the posterior in the original space from the feature space.


Review for NeurIPS paper: On Convergence of Nearest Neighbor Classifiers over Feature Transformations

Neural Information Processing Systems

This paper provides some interesting theoretical insights into the convergence of kNN over feature transformations. This is backed up by some empirical results. All three reviewers argue for acceptance, but have also provided some directions for improvement, which was acknowledged by the authors in their feedback, promising to include these changes in the final version. Personally I have one issue with the paper, which is introducing some datasets in the experimental section, without providing any results. These are supplied in the additional material, to me that feels like cheating.


Reviews: An adaptive nearest neighbor rule for classification

Neural Information Processing Systems

After reading the author feedback, I am increasing my score as the authors have largely addressed my main concern of comparison/relating to existing locally adaptive NN estimators. The authors address this concern on the theory side. I still think additional numerical experiments to compare the proposed method with other adaptive NN approaches would strengthen the paper and be beneficial to practitioners (and could possibly lead to interesting theoretical questions if, for instance, there are some surprising performance gaps). The analysis is quite clean and elegant due to the newly introduced local notion of "advantage" as an alternative to the usual local Lipschitz conditions. The paper, however, does not discuss enough related work, especially adaptive nearest neighbor and kernel methods based on Lepski's method (for example, the adaptive k-NN method by Kpotufe (2011)).


Reviews: An adaptive nearest neighbor rule for classification

Neural Information Processing Systems

The paper proposes a variant of the k-Nearest Neighbors algorithm (called adaptive knn) in which k is chosen for each example to classify, instead of being tuned as a global hyperparameter. To do so, the authors define a new notion that applied locally in the input space they call the advantage, instead of the local Lipschitz condition that is often used in such setting. An important contribution of the paper is the prove that the proposed algorithm is consistent and have pointwise convergence at the limit. The proposed notion of advantage is allers related to some error bounds for pointwise convergence. The experimental part is clearly sufficient for this type of paper, even if there is no comparison with other state-of-the-art algorithm.


HM-ANN: Efficient Billion-Point Nearest Neighbor Search on Heterogeneous Memory

Neural Information Processing Systems

The state-of-the-art approximate nearest neighbor search (ANNS) algorithms face a fundamental tradeoff between query latency and accuracy, because of small main memory capacity: To store indices in main memory for short query latency, the ANNS algorithms have to limit dataset size or use a quantization scheme which hurts search accuracy. The emergence of heterogeneous memory (HM) brings a solution to significantly increase memory capacity and break the above tradeoff: Using HM, billions of data points can be placed in the main memory on a single machine without using any data compression. However, HM consists of both fast (but small) memory and slow (but large) memory, and using HM inappropriately slows down query significantly. In this work, we present a novel graph-based similarity search algorithm called HM-ANN, which takes both memory and data heterogeneity into consideration and enables billion-scale similarity search on a single node without using compression. On two billion-sized datasets BIGANN and DEEP1B, HM-ANN outperforms state-of-the-art compression-based solutions such as L&C and IMI OPQ in recall-vs-latency by a large margin, obtaining 46% higher recall under the same search latency. We also extend existing graph-based methods such as HNSW and NSG with two strong baseline implementations on HM.


Reviews: Learning Nearest Neighbor Graphs from Noisy Distance Samples

Neural Information Processing Systems

As a non theory person I was able to follow the motivation, the problem set-up and the main result. The intuition that not all queries Q(j,k) needs to be issued conditioned on the past queries due to the fact that we're in a metric space. The bounds looks fine, mostly it is a construction of the confidence in the estimation of Q(j,k), again, under the constraint that Q(j,k) is a metric space with metric-ish properties. As a piece of technical achievement this paper is just fine. But it can improve in the following sense of story-telling and organisation: 1) we're doing an optimal query problem, namely, querying a noisey oracle Q(j,k) to construct a nearest-neighbor graph G(x_i,x_j).


Reviews: Learning Nearest Neighbor Graphs from Noisy Distance Samples

Neural Information Processing Systems

This paper defines a new learning problem of learning 1-NN graph on a metric space consisting of n points using a noisy distance oracle. Ignoring the metric structure, the problem can be viewed n separate best arm identification problems (for each node find its nearest neighbor by calling the oracle). The best arm identification problem can solved by UCB-like methods that maintain confidence intervals for each distance and has been studied extensively in the literature. The paper proposes an improved algorithm that exploits the triangle inequality and symmetry of the underlying metric in order to improve the confidence intervals. As the paper shows that the improvement doesn't help much for the worst-case metric space.