linear scan
AMoreExperiments
In our experiments, we adopt the standard exact Hamming search by linear scan. On a single core 2.0GHz CPU compiled with C++, searching over 1M samples onSIFT takes 17 approximately 0.15s per query withb = 512. Note that linear scan is a naive strategy. Firstly,we see that the differences among the curvesare very small. B.2 RankingEfficiency:MorecandρValues we provide more theoretical comparisons on the ranking efficiency at moreρ and c values. Figure 14: CIFAR-VGGTop-10 retrieved images (right) for two example query images (left, automobile and cat) withb = 512.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper generalizes the LSH method to account for the (bounded) lengths of the data base vectors, so that the LSH tricks for fast approximate nearest neighbor search can exploit the well-known relation between Euclidian distance and dot product similarity (e.g. as in equation 2) and support MIPS search as well. They give 3 motivating examples where solving MIPS vs kNN per se is more appropriate and needed. Their algorithm is essentially equation 9 (using equation 7 compute vector reformulations Q(q) and P(x) of the query a database element respectively). This is based on apparently novel observation (equation 8) that the distance from the query converges to the dot product plus a constant, when a parameter m which exponentiated the P(x) vector elements is sufficiently large (e.g.
- Asia > Afghanistan > Parwan Province > Charikar (0.05)
- North America > Canada > Quebec > Montreal (0.05)
Reviews: Locality-Sensitive Hashing for f-Divergences: Mutual Information Loss and Beyond
The paper presents locality-sensitive hashing schemes for well-studied distance function between probability distributions. The new schemes are based on the ideas. The first one is to approximate the distance function of interest by another distance function for which LSH schemes are known. In particular, the paper shows how to approximate MIL divergence and triangular discrimination by the Hellinger distance, for which LSH schemes are known. The second is specific to the MIL divergence, and involves representing the latter distance function as a so-called Krein kernel, and designing an asymmetric LSH scheme.
Fast Exact Search in Hamming Space with Multi-Index Hashing
Norouzi, Mohammad, Punjani, Ali, Fleet, David J.
There is growing interest in representing image data and feature descriptors using compact binary codes for fast near neighbor search. Although binary codes are motivated by their use as direct indices (addresses) into a hash table, codes longer than 32 bits are not being used as such, as it was thought to be ineffective. We introduce a rigorous way to build multiple hash tables on binary code substrings that enables exact k-nearest neighbor search in Hamming space. The approach is storage efficient and straightforward to implement. Theoretical analysis shows that the algorithm exhibits sub-linear run-time behavior for uniformly distributed codes. Empirical results show dramatic speedups over a linear scan baseline for datasets of up to one billion codes of 64, 128, or 256 bits.
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.92)
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.66)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.60)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)
Learning to Hash with Binary Reconstructive Embeddings
Fast retrieval methods are increasingly critical for many large-scale analysis tasks, and there have been several recent methods that attempt to learn hash functions for fast and accurate nearest neighbor searches. In this paper, we develop an algorithm for learning hash functions based on explicitly minimizing the reconstruction error between the original distances and the Hamming distances of the corresponding binary embeddings. We develop a scalable coordinate-descent algorithm for our proposed hashing objective that is able to efficiently learn hash functions in a variety of settings. Unlike existing methods such as semantic hashing and spectral hashing, our method is easily kernelized and does not require restrictive assumptions about the underlying distribution of the data. We present results over several domains to demonstrate that our method outperforms existing state-of-the-art techniques.
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)