Goto

Collaborating Authors

 nearest neighbor


Nearly-Linear Time and Massively Parallel Algorithms for k-Anonymity

Neural Information Processing Systems

Previous algorithms with provable guarantees either (1) achieve the same O(k)approximation ratio but require at least O(n2k) runtime, or (2) provide a better O(logk) approximation ratio at the cost of an impractical O(n2k) worst-case runtime for general d and k. Our algorithm extends to the Massively Parallel Computation (MPC) model, where it gives an MPC algorithm requiring eO(log1+ฮต n) rounds and total space O(n1+ฮณ(d+k)). Empirically, we also demonstrate that our algorithmic ideas can be adapted to existing heuristic methods, leading to significant speed-ups while preserving comparable performance. On the hardness side, we study the related single-point k-anonymity problem, where the goal is to select k 1 additional records to make a given record indistinguishable. Assuming the dense vs random conjecture in complexity theory, we show that for n = kc, no algorithm can achieve a k1 O(1/c) approximation in poly(n) time, providing evidence for the inherent hardness of the k-anonymity problem.


Results of the Big ANN: NeurIPS'23 competition

Neural Information Processing Systems

The 2023 Big ANNChallenge, held at NeurIPS'23, aimed at advancing the stateof-the-art in indexing data structures and search algorithms. It focused for practical variants of Approximate Nearest Neighbor (ANN) search that reflect the growing complexity and diversity of workloads. Unlike prior challenges that emphasized scaling up classical ANN search [21], this competition addressed filtered search, out-of-distribution data, sparse and streaming variants of ANNS. Participants developed and submitted innovative solutions that were evaluated on new standard datasets with constrained computational resources.


Beyond Benign Overfitting in Nadaraya-Watson Interpolators

Neural Information Processing Systems

In recent years, there has been much interest in understanding the generalization behavior of interpolating predictors, which overfit on noisy training data. Whereas standard analyses are concerned with whether a method is consistent or not, recent observations have shown that even inconsistent predictors can generalize well. In this work, we revisit the classic interpolating Nadaraya-Watson (NW) estimator (also known as Shepard's method), and study its generalization capabilities through this modern viewpoint. In particular, by varying a single bandwidth-like hyperparameter, we prove the existence of multiple overfitting behaviors, ranging non-monotonically from catastrophic, through benign, to tempered. Our results highlight how even classical interpolating methods can exhibit intricate generalization behaviors. In addition, for the purpose of tuning the hyperparameter, the results suggest that over-estimating the intrinsic dimension of the data is less harmful than under-estimating it. Numerical experiments complement our theory, demonstrating the same phenomena.


Nearest-Neighbor Radii under Dependent Sampling

arXiv.org Machine Learning

Nearest-neighbor methods are fundamental to classical and modern machine learning, yet their geometric properties are typically analyzed under independent sampling. In this paper, we study the nearest-neighbor radii under dependent sampling. We consider strong mixing dependent observations and ask whether dependence changes the scale of nearest-neighbor neighborhoods. We establish distribution-free almost sure convergence under polynomial mixing and sharp non-asymptotic moment bounds under geometric mixing. The moment bounds depend on the local intrinsic dimension rather than the ambient dimension, making the results applicable to high-dimensional data concentrated near lower-dimensional manifolds. Synthetic experiments and real-world time-series benchmarks support the theory, showing that nearest-neighbor geometry remains informative under dependence sampling.


Retrieval-Augmented Diffusion Models

Neural Information Processing Systems

Novel architectures have recently improved generative image synthesis leading to excellent visual quality in various tasks. Much of this success is due to the scalability of these architectures and hence caused by a dramatic increase in model complexity and in the computational resources invested in training these models.


ANear-Linear Time Algorithm for the Chamfer Distance

Neural Information Processing Systems

Further, the Chamfer distance is often used as a proxy for the more computationally demanding Earth-Mover (Optimal Transport) Distance. However, the quadratic dependence on n in the running time makes the naive approach intractable for large datasets. We overcome this bottleneck and present the first (1+")-approximate algorithm for estimating the Chamfer distance with a near-linear running time. Specifically, our algorithm runs in time O ndlog(n)/"2 and is implementable. Our experiments demonstrate that it is both accurate and fast on large high-dimensional datasets. We believe that our algorithm will open new avenues for analyzing large highdimensional point clouds. We also give evidence that if the goal is to report a (1+")-approximate mapping from A to B (as opposed to just its value), then any sub-quadratic time algorithm is unlikely to exist.


Worst-case Performance of Popular Approximate Nearest Neighbor Search Implementations: Guarantees and Limitations

Neural Information Processing Systems

Graph-based approaches to nearest neighbor search are popular and powerful tools for handling large datasets in practice, but they have limited theoretical guarantees. We study the worst-case performance of recent graph-based approximate nearest neighbor search algorithms, such as HNSW, NSG and DiskANN. For DiskANN, we show that its "slow preprocessing" version provably supports approximate nearest neighbor search query with constant approximation ratio and poly-logarithmic query time, on data sets with bounded "intrinsic" dimension. For the other data structure variants studied, including DiskANN with "fast preprocessing", HNSW and NSG, we present a family of instances on which the empirical query time required to achieve a "reasonable" accuracy is linear in instance size. For example, for DiskANN, we show that the query procedure can take at least 0.1n steps on instances of size nbefore it encounters any of the 5nearest neighbors of the query.


Consistent Non-Parametric Methods for Maximizing Robustness

Neural Information Processing Systems

Learning classifiers that are robust to adversarial examples has received a great deal of recent attention. A major drawback of the standard robust learning framework is there is an artificial robustness radius r that applies to all inputs. This ignores the fact that data may be highly heterogeneous, in which case it is plausible that robustness regions should be larger in some regions of data, and smaller in others. In this paper, we address this limitation by proposing a new limit classifier, called the neighborhood optimal classifier, that extends the Bayes optimal classifier outside its support by using the label of the closest in-support point. We then argue that this classifier maximizes the size of its robustness regions subject to the constraint of having accuracy equal to the Bayes optimal. We then present sufficient conditions under which general non-parametric methods that can be represented as weight functions converge towards this limit, and show that both nearest neighbors and kernel classifiers satisfy them under certain conditions.



Supplementary AViT 3B model

Neural Information Processing Systems

The ViT model we use in this work is based on a standard Vision Transformer [7] model scaled to577 nearly 3 billion parameters, using a patch size of 14, 16 heads, 64 blocks, an MLP dimension of 8192578 and a hidden dimension of 2048. The model is defined and trained in Lingvo [32]; we additionally579 employ GSPMD [41] for training. The model is pre-trained on JFT-3B [35] using training settings580 that optimize for performance on JFT-3B rather than for fine-tuning on ImageNet; notably, we do not581 use the training recipe that helps few-shot transfer performance [44]. BReview tools586 We include screenshots of the reviewing tools we built to analyze model mistakes. Figure 3 shows587 the UI for reviewing model predictions and Figure 4 shows the UI that displays the labeling guide588 and slide bar to browse images for a particular class.