One Permutation Hashing

Dec-31-2012–Neural Information Processing Systems

While minwise hashing is promising for large-scale learning in massive binary data, the preprocessing cost is prohibitive as it requires applying (e.g.,) $k=500$ permutations on the data. The testing time is also expensive if a new data point (e.g., a new document or a new image) has not been processed. In this paper, we develop a simple \textbf{one permutation hashing} scheme to address this important issue. While it is true that the preprocessing step can be parallelized, it comes at the cost of additional hardware and implementation. Also, reducing $k$ permutations to just one would be much more \textbf{energy-efficient}, which might be an important perspective as minwise hashing is commonly deployed in the search industry. While the theoretical probability analysis is interesting, our experiments on similarity estimation and SVM \& logistic regression also confirm the theoretical results.

artificial intelligence, machine learning, permutation, (14 more...)

Neural Information Processing Systems

Dec-31-2012

Conferences PDF

Add feedback

Country:
- Europe (0.68)
- North America > United States
  - California > Santa Clara County (0.28)

Genre:
- Research Report > New Finding (0.49)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)

Duplicate Docs Excel Report

Title
One Permutation Hashing

Similar Docs Excel Report more

Title	Similarity	Source
None found