distance oracle
We thank the reviewer for taking the time to review our submission and for their helpful
We will elaborate on works relevant to the noiseless triplet setting in the related work section. Thank you also for pointing out the typos. That is an interesting way of looking at the problem. If the dataset consists of hierarchical clusters as in the condition for Theorem 4.6, This is made more explicit in the discussion following Theorem 4.4. Houle, Michael E., and Michael Nett (2013), Rank cover trees for nearest neighbor search, International Conference on
Fast Distance Oracles for Any Symmetric Norm
This primitive is a basic subroutine in machine learning, data mining and similarity search applications. In the case of \ell_p norms, the problem is well understood, and optimal data structures are known for most values of p . This class includes \ell_p norms and Orlicz norms as special cases, as well as other norms used in practice, e.g. We propose a novel data structure with \tilde{O}(n (d \mathrm{mmc}(l) 2)) preprocessing time and space, and t_q \tilde{O}(d n \cdot \mathrm{mmc}(l) 2) query time, where \mathrm{mmc}(l) is a complexity-measure (modulus) of the symmetric norm under consideration.
Nearest Neighbor Search Under Uncertainty
Mason, Blake, Tripathy, Ardhendu, Nowak, Robert
Nearest Neighbor Search (NNS) is a central task in knowledge representation, learning, and reasoning. There is vast literature on efficient algorithms for constructing data structures and performing exact and approximate NNS. This paper studies NNS under Uncertainty (NNSU). Specifically, consider the setting in which an NNS algorithm has access only to a stochastic distance oracle that provides a noisy, unbiased estimate of the distance between any pair of points, rather than the exact distance. This models many situations of practical importance, including NNS based on human similarity judgements, physical measurements, or fast, randomized approximations to exact distances. A naive approach to NNSU could employ any standard NNS algorithm and repeatedly query and average results from the stochastic oracle (to reduce noise) whenever it needs a pairwise distance. The problem is that a sufficient number of repeated queries is unknown in advance; e.g., a point maybe distant from all but one other point (crude distance estimates suffice) or it may be close to a large number of other points (accurate estimates are necessary). This paper shows how ideas from cover trees and multi-armed bandits can be leveraged to develop an NNSU algorithm that has optimal dependence on the dataset size and the (unknown)geometry of the dataset.
- North America > United States > Wisconsin > Dane County > Madison (0.14)
- North America > United States > Missouri > Phelps County > Rolla (0.04)
- Research Report (0.69)
- Overview (0.47)