Hashing Algorithms for Large-Scale Learning

Li, Ping, Shrivastava, Anshumali, Moore, Joshua L., König, Arnd C.

Dec-31-2011–Neural Information Processing Systems

Minwise hashing is a standard technique in the context of search for efficiently computing set similarities. The recent development of b-bit minwise hashing provides a substantial improvement by storing only the lowest b bits of each hashed value. In this paper, we demonstrate that b-bit minwise hashing can be naturally integrated with linear learning algorithms such as linear SVM and logistic regression, to solve large-scale and high-dimensional statistical learning tasks, especially when the data do not fit in memory. We compare $b$-bit minwise hashing with the Count-Min (CM) and Vowpal Wabbit (VW) algorithms, which have essentially the same variances as random projections. Our theoretical and empirical comparisons illustrate that b-bit minwise hashing is significantly more accurate (at the same storage cost) than VW (and random projections) for binary data.

artificial intelligence, b-bit minwise, machine learning, (14 more...)

Neural Information Processing Systems

Dec-31-2011

Conferences PDF

Add feedback

Country:
- Europe (1.00)
- North America
  - United States (0.68)
  - Canada (0.46)

Genre:
- Research Report (0.56)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.49)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found