b-Bit Minwise Hashing for Estimating Three-Way Similarities
Li, Ping, Konig, Arnd, Gui, Wenhao
–Neural Information Processing Systems
Computing two-way and multi-way set similarities is a fundamental problem. This study focuses on estimating 3-way resemblance (Jaccard similarity) using b-bit minwise hashing. While traditional minwise hashing methods store each hashed value using 64 bits, b-bit minwise hashing only stores the lowest b bits (where b>= 2 for 3-way). The extension to 3-way similarity from the prior work on 2-way similarity is technically non-trivial. We develop the precise estimator which is accurate and very complicated; and we recommend a much simplified estimator suitable for sparse data. Our analysis shows that $b$-bit minwise hashing can normally achieve a 10 to 25-fold improvement in the storage space required for a given estimator accuracy of the 3-way resemblance.
Neural Information Processing Systems
Dec-31-2010
- Country:
- Europe (1.00)
- North America
- Canada (0.94)
- United States > California (0.46)
- Technology:
- Information Technology
- Data Science > Data Mining (1.00)
- Communications (1.00)
- Information Management > Search (0.94)
- Artificial Intelligence
- Machine Learning (1.00)
- Natural Language > Information Retrieval (0.46)
- Information Technology