Goto

Collaborating Authors

 Case-Based Reasoning


the questions raised by each reviewer separately. Layer Choice (R2,R3) The layer can be chosen depending on the size of the nearest neighbor patch the user would

Neural Information Processing Systems

We thank all reviewers for their constructive comments. We fixed the architecture g in our previous experiments to a two layer neural network. We discussed how to select the layer choice and its impact above. The computational cost is low since the pretrained model is fixed, and we only optimize for g and c. We didn't test on Imagenet since we can't visualize results for all 1000 classes.


Interpretable Machine Learning for Oral Lesion Diagnosis through Prototypical Instances Identification

arXiv.org Artificial Intelligence

Decision-making processes in healthcare can be highly complex and challenging. Machine Learning tools offer significant potential to assist in these processes. However, many current methodologies rely on complex models that are not easily interpretable by experts. This underscores the need to develop interpretable models that can provide meaningful support in clinical decision-making. When approaching such tasks, humans typically compare the situation at hand to a few key examples and representative cases imprinted in their memory. Using an approach which selects such exemplary cases and grounds its predictions on them could contribute to obtaining high-performing interpretable solutions to such problems. To this end, we evaluate PivotTree, an interpretable prototype selection model, on an oral lesion detection problem, specifically trying to detect the presence of neoplastic, aphthous and traumatic ulcerated lesions from oral cavity images. We demonstrate the efficacy of using such method in terms of performance and offer a qualitative and quantitative comparison between exemplary cases and ground-truth prototypes selected by experts.


SOAR: Improved Indexing for Approximate Nearest Neighbor Search

Neural Information Processing Systems

This paper introduces SOAR: Spilling with Orthogonality-Amplified Residuals, a novel data indexing technique for approximate nearest neighbor (ANN) search. SOAR extends upon previous approaches to ANN search, such as spill trees, that utilize multiple redundant representations while partitioning the data to reduce the probability of missing a nearest neighbor during search. Rather than training and computing these redundant representations independently, however, SOAR uses an orthogonality-amplified residual loss, which optimizes each representation to compensate for cases where other representations perform poorly. This drastically improves the overall index quality, resulting in state-of-the-art ANN benchmark performance while maintaining fast indexing times and low memory consumption.


CSPG: Crossing Sparse Proximity Graphs for Approximate Nearest Neighbor Search

Neural Information Processing Systems

The state-of-the-art approximate nearest neighbor search (ANNS) algorithm builds a large proximity graph on the dataset and performs a greedy beam search, which may bring many unnecessary explorations. We develop a novel framework, namely corssing sparse proximity graph (CSPG), based on random partitioning of the dataset. It produces a smaller sparse proximity graph for each partition and routing vectors that bind all the partitions. An efficient two-staged approach is designed for exploring CSPG, with fast approaching and cross-partition expansion. We theoretically prove that CSPG can accelerate the existing graph-based ANNS algorithms by reducing unnecessary explorations. In addition, we conduct extensive experiments on benchmark datasets.


LoRANN: Low-Rank Matrix Factorization for Approximate Nearest Neighbor Search

Neural Information Processing Systems

Approximate nearest neighbor (ANN) search is a key component in many modern machine learning pipelines; recent use cases include retrieval-augmented generation (RAG) and vector databases. Clustering-based ANN algorithms, that use score computation methods based on product quantization (PQ), are often used in industrial-scale applications due to their scalability and suitability for distributed and disk-based implementations. However, they have slower query times than the leading graph-based ANN algorithms. In this work, we propose a new supervised score computation method based on the observation that inner product approximation is a multivariate (multi-output) regression problem that can be solved efficiently by reduced-rank regression. Our experiments show that on modern high-dimensional data sets, the proposed reduced-rank regression (RRR) method is superior to PQ in both query latency and memory usage. We also introduce LoRANN, a clustering-based ANN library that leverages the proposed score computation method.


One-Layer Transformer Provably Learns One-Nearest Neighbor In Context

Neural Information Processing Systems

Transformers have achieved great success in recent years. Interestingly, transformers have shown particularly strong in-context learning capability -- even without fine-tuning, they are still able to solve unseen tasks well purely based on task-specific prompts. In this paper, we study the capability of one-layer transformers in learning the one-nearest neighbor prediction rule. Under a theoretical framework where the prompt contains a sequence of labeled training data and unlabeled test data, we show that, although the loss function is nonconvex, when trained with gradient descent, a single softmax attention layer can successfully learn to behave like a one-nearest neighbor classifier. Our result gives a concrete example on how transformers can be trained to implement nonparametric machine learning algorithms, and sheds light on the role of softmax attention in transformer models.


Nearest Neighbor Speculative Decoding for LLM Generation and Attribution

Neural Information Processing Systems

Large language models (LLMs) often hallucinate and lack the ability to provide attribution for their generations. Semi-parametric LMs, such as kNN-LM, approach these limitations by refining the output of an LM for a given prompt using its nearest neighbor matches in a non-parametric data store. However, these models often exhibit slow inference speeds and produce non-fluent texts. In this paper, we introduce Nearest Neighbor Speculative Decoding (NEST), a novel semi-parametric language modeling approach that is capable of incorporating real-world text spans of arbitrary length into the LM generations and providing attribution to their sources. NEST performs token-level retrieval at each inference step to compute a semi-parametric mixture distribution and identify promising span continuations in a corpus.


Online Consistency of the Nearest Neighbor Rule

Neural Information Processing Systems

In the realizable online setting, a learner is tasked with making predictions for a stream of instances, where the correct answer is revealed after each prediction. A learning rule is online consistent if its mistake rate eventually vanishes. The nearest neighbor rule is fundamental prediction strategy, but it is only known to be consistent under strong statistical or geometric assumptions: the instances come i.i.d. or the label classes are well-separated. We prove online consistency for all measurable functions in doubling metric spaces under the mild assumption that instances are generated by a process that is uniformly absolutely continuous with respect to an underlying finite, upper doubling measure.


Navigable Graphs for High-Dimensional Nearest Neighbor Search: Constructions and Limits

Neural Information Processing Systems

There has been significant recent interest in graph-based nearest neighbor search methods, many of which are centered on the construction of (approximately) "navigable" graphs over high-dimensional point sets. A graph is navigable if we can successfully move from any starting node to any target node using a greedy routing strategy where we always move to the neighbor that is closest to the destination according to the given distance function. The complete graph is obviously navigable for any point set, but the important question for applications is if sparser graphs can be constructed. While this question is fairly well understood in low-dimensions, we establish some of the first upper and lower bounds for high-dimensional point sets. First, we give a simple and efficient way to construct a navigable graph with average degree O(\sqrt{n \log n }) for any set of n points, in any dimension, for any distance function.


Embedding Dimension of Contrastive Learning and k -Nearest Neighbors

Neural Information Processing Systems

We study the embedding dimension of distance comparison data in two settings: contrastive learning and k -nearest neighbors ( k -NN). In both cases, the goal is to find the smallest dimension d of an \ell_p -space in which a given dataset can be represented. We show that the arboricity of the associated graphs plays a key role in designing embeddings. Using this approach, for the most frequently used \ell_2 -distance, we get matching upper and lower bounds in both settings. In contrastive learning, we are given m labeled samples of the form (x_i, y_i, z_i -) representing the fact that the positive example y_i is closer to the anchor x_i than the negative example z_i .