Learning-Based Hashing for ANN Search: Foundations and Early Advances
–arXiv.org Artificial Intelligence
Approximate Nearest Neighbour (ANN) search is a fundamental problem in information retrieval, underpinning large-scale applications in computer vision, natural language processing, and cross-modal search. Hashing-based methods provide an efficient solution by mapping high-dimensional data into compact binary codes that enable fast similarity computations in Hamming space. Over the past two decades, a substantial body of work has explored learning to hash, where projection and quantisation functions are optimised from data rather than chosen at random. This article offers a foundational survey of early learning-based hashing methods, with an emphasis on the core ideas that shaped the field. We review supervised, unsupervised, and semi-supervised approaches, highlighting how projection functions are designed to generate meaningful embeddings and how quantisation strategies convert these embeddings into binary codes. We also examine extensions to multi-bit and multi-threshold models, as well as early advances in cross-modal retrieval. Rather than providing an exhaustive account of the most recent methods, our goal is to introduce the conceptual foundations of learning-based hashing for ANN search. By situating these early models in their historical context, we aim to equip readers with a structured understanding of the principles, trade-offs, and open challenges that continue to inform current research in this area.
arXiv.org Artificial Intelligence
Oct-7-2025
- Country:
- Asia
- Afghanistan > Parwan Province
- Charikar (0.04)
- Japan > Honshū
- Kansai > Kyoto Prefecture > Kyoto (0.04)
- Middle East
- Israel > Haifa District
- Haifa (0.04)
- Jordan (0.04)
- Israel > Haifa District
- Singapore (0.04)
- Afghanistan > Parwan Province
- Europe
- Austria > Vienna (0.14)
- Bulgaria > Sofia City Province
- Sofia (0.04)
- France (0.04)
- Greece (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- North America
- Canada
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Ontario > Toronto (0.14)
- Quebec > Montreal (0.04)
- British Columbia > Metro Vancouver Regional District
- United States
- New York > New York County
- New York City (0.04)
- New Jersey > Hudson County
- Secaucus (0.04)
- Alaska > Anchorage Municipality
- Anchorage (0.04)
- District of Columbia > Washington (0.04)
- Washington > King County
- Bellevue (0.04)
- Georgia > Fulton County
- Atlanta (0.04)
- Massachusetts > Middlesex County
- Rhode Island > Providence County
- Providence (0.04)
- California > San Francisco County
- San Francisco (0.14)
- Nevada (0.04)
- Maryland > Baltimore (0.04)
- New York > New York County
- Canada
- Asia
- Genre:
- Overview (1.00)
- Industry:
- Education > Educational Setting (0.47)
- Information Technology > Services (0.46)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Statistical Learning (1.00)
- Natural Language
- Information Retrieval (0.88)
- Text Processing (1.00)
- Representation & Reasoning > Search (0.68)
- Vision (1.00)
- Information Technology > Artificial Intelligence