sblsh
Super-Bit Locality-Sensitive Hashing
Jianqiu Ji, Jianmin Li, Shuicheng Yan, Bo Zhang, Qi Tian
Sign-random-projection locality-sensitive hashing (SRP-LSH) is a probabilistic dimension reduction method which provides an unbiased estimate of angular similarity, yet suffers from the large variance of its estimation. In this work, we propose the Super-Bit locality-sensitive hashing (SBLSH). It is easy to implement, which orthogonalizes the random projection vectors in batches, and it is theoretically guaranteed that SBLSH also provides an unbiased estimate of angular similarity, yet with a smaller variance when the angle to estimate is within (0, /2]. The extensive experiments on real data well validate that given the same length of binary code, SBLSH may achieve significant mean squared error reduction in estimating pairwise angular similarity. Moreover, SBLSH shows the superiority over SRP-LSH in approximate nearest neighbor (ANN) retrieval experiments.
- North America > United States > Texas > Bexar County > San Antonio (0.04)
- Asia > Singapore > Central Region > Singapore (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)
- Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.34)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.30)
Super-Bit Locality-Sensitive Hashing
Sign-random-projection locality-sensitive hashing (SRP-LSH) is a probabilistic dimension reduction method which provides an unbiased estimate of angular similarity, yet suffers from the large variance of its estimation. In this work, we propose the Super-Bit locality-sensitive hashing (SBLSH). It is easy to implement, which orthogonalizes the random projection vectors in batches, and it is theoretically guaranteed that SBLSH also provides an unbiased estimate of angular similarity, yet with a smaller variance when the angle to estimate is within (0,\pi/2] . The extensive experiments on real data well validate that given the same length of binary code, SBLSH may achieve significant mean squared error reduction in estimating pairwise angular similarity. Moreover, SBLSH shows the superiority over SRP-LSH in approximate nearest neighbor (ANN) retrieval experiments.
Super-Bit Locality-Sensitive Hashing Jianqiu Ji
Sign-random-projection locality-sensitive hashing (SRP-LSH) is a probabilistic dimension reduction method which provides an unbiased estimate of angular similarity, yet suffers from the large variance of its estimation. In this work, we propose the Super-Bit locality-sensitive hashing (SBLSH). It is easy to implement, which orthogonalizes the random projection vectors in batches, and it is theoretically guaranteed that SBLSH also provides an unbiased estimate of angular similarity, yet with a smaller variance when the angle to estimate is within (0, /2]. The extensive experiments on real data well validate that given the same length of binary code, SBLSH may achieve significant mean squared error reduction in estimating pairwise angular similarity. Moreover, SBLSH shows the superiority over SRP-LSH in approximate nearest neighbor (ANN) retrieval experiments.
- North America > United States > Texas > Bexar County > San Antonio (0.04)
- Asia > Singapore > Central Region > Singapore (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)
- Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.34)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.30)
Super-Bit Locality-Sensitive Hashing
Ji, Jianqiu, Li, Jianmin, Yan, Shuicheng, Zhang, Bo, Tian, Qi
Sign-random-projection locality-sensitive hashing (SRP-LSH) is a probabilistic dimension reduction method which provides an unbiased estimate of angular similarity, yet suffers from the large variance of its estimation. In this work, we propose the Super-Bit locality-sensitive hashing (SBLSH). It is easy to implement, which orthogonalizes the random projection vectors in batches, and it is theoretically guaranteed that SBLSH also provides an unbiased estimate of angular similarity, yet with a smaller variance when the angle to estimate is within $(0,\pi/2]$. The extensive experiments on real data well validate that given the same length of binary code, SBLSH may achieve significant mean squared error reduction in estimating pairwise angular similarity. Moreover, SBLSH shows the superiority over SRP-LSH in approximate nearest neighbor (ANN) retrieval experiments.
S2JSD-LSH: A Locality-Sensitive Hashing Schema for Probability Distributions
Mao, Xian-Ling (Beijing Institute of Technology) | Feng, Bo-Si (Beijing Institute of Technology) | Hao, Yi-Jing (Beijing Institute of Technology) | Nie, Liqiang (National University of Singapore) | Huang, Heyan (Beijing Institute of Technology) | Wen, Guihua (South China University of Technology)
To compare the similarity of probability distributions, the information-theoretically motivated metrics like Kullback-Leibler divergence (KL) and Jensen-Shannon divergence (JSD) are often more reasonable compared with metrics for vectors like Euclidean and angular distance. However, existing locality-sensitive hashing (LSH) algorithms cannot support the information-theoretically motivated metrics for probability distributions. In this paper, we first introduce a new approximation formula for S2JSD-distance, and then propose a novel LSH scheme adapted to S2JSD-distance for approximate nearest neighbors search in high-dimensional probability distributions. We define the specific hashing functions, and prove their local-sensitivity. Furthermore, extensive empirical evaluations well illustrate the effectiveness of the proposed hashing schema on six public image datasets and two text datasets, in terms of mean Average Precision, Precision@N and Precision-Recall curve.
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Singapore > Central Region > Singapore (0.04)
- (2 more...)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.67)
Super-Bit Locality-Sensitive Hashing
Ji, Jianqiu, Li, Jianmin, Yan, Shuicheng, Zhang, Bo, Tian, Qi
Sign-random-projection locality-sensitive hashing (SRP-LSH) is a probabilistic dimension reduction method which provides an unbiased estimate of angular similarity, yet suffers from the large variance of its estimation. In this work, we propose the Super-Bit locality-sensitive hashing (SBLSH). It is easy to implement, which orthogonalizes the random projection vectors in batches, and it is theoretically guaranteed that SBLSH also provides an unbiased estimate of angular similarity, yet with a smaller variance when the angle to estimate is within $(0,\pi/2]$. The extensive experiments on real data well validate that given the same length of binary code, SBLSH may achieve significant mean squared error reduction in estimating pairwise angular similarity. Moreover, SBLSH shows the superiority over SRP-LSH in approximate nearest neighbor (ANN) retrieval experiments.
- North America > United States > Texas > Bexar County > San Antonio (0.04)
- Asia > Singapore > Central Region > Singapore (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)
- Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.34)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.30)