Reviews: Re-randomized Densification for One Permutation Hashing and Bin-wise Consistent Weighted Sampling

Jun-1-2025, 06:03:51 GMT–Neural Information Processing Systems

The authors propose that the optimal densification for OPH can actually be further optimized. In usual OPH, we get one permutation of the sparse vector, break the vector into K equal sized bins. In the usual Consistent Weighted Sampling (CWS) approach, we sample non-empty bins from these K bins and retrieve a fixed hash code for these bins. In this new approach, the authors suggest to treat each of the K bins as a separate sparse vector and perform MinHash on these retrieved bins to get a hash code instead of directly getting a Hash code. The authors theoretically prove that this re-randomization achieves the smallest variance among densification schemes(that are used to retrieve hash codes from empty buckets). Also, they extend this idea to weighted non-negative sparse vectors (by a method called Bin-wise CWS) The paper seems to be a subtle improvement over prior work.

bin-wise consistent weighted sampling, consistent weighted sampling, re-randomized densification, (4 more...)

Neural Information Processing Systems

Jun-1-2025, 06:03:51 GMT

Conferences Web Page

Add feedback

Genre:
- Summary/Review (0.41)

Technology:
- Information Technology
  - Data Science > Data Mining (0.81)
  - Artificial Intelligence > Machine Learning (0.64)