Representation Learning for Efficient and Effective Similarity Search and Recommendation
–arXiv.org Artificial Intelligence
How data is represented and operationalized is critical for building computational solutions that are both effective and efficient. A common approach is to represent data objects as binary vectors, denoted \textit{hash codes}, which require little storage and enable efficient similarity search through direct indexing into a hash table or through similarity computations in an appropriate space. Due to the limited expressibility of hash codes, compared to real-valued representations, a core open challenge is how to generate hash codes that well capture semantic content or latent properties using a small number of bits, while ensuring that the hash codes are distributed in a way that does not reduce their search efficiency. State of the art methods use representation learning for generating such hash codes, focusing on neural autoencoder architectures where semantics are encoded into the hash codes by learning to reconstruct the original inputs of the hash codes. This thesis addresses the above challenge and makes a number of contributions to representation learning that (i) improve effectiveness of hash codes through more expressive representations and a more effective similarity measure than the current state of the art, namely the Hamming distance, and (ii) improve efficiency of hash codes by learning representations that are especially suited to the choice of search method. The contributions are empirically validated on several tasks related to similarity search and recommendation.
arXiv.org Artificial Intelligence
Sep-4-2021
- Country:
- South America > Brazil (0.04)
- North America
- United States
- New York > New York County
- New York City (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- California > San Francisco County
- San Francisco (0.13)
- Arizona > Maricopa County
- Scottsdale (0.04)
- New York > New York County
- Canada > Quebec
- Montreal (0.04)
- United States
- Europe
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Slovenia > Central Slovenia
- Municipality of Ljubljana > Ljubljana (0.04)
- Greece > Central Macedonia
- Thessaloniki (0.04)
- France > Île-de-France
- Denmark > Capital Region
- Copenhagen (0.05)
- United Kingdom > England
- Asia
- Middle East > Jordan (0.04)
- Singapore (0.04)
- China
- Hubei Province > Wuhan (0.04)
- Hong Kong (0.04)
- Genre:
- Research Report > Promising Solution (1.00)
- Overview (1.00)
- Industry:
- Leisure & Entertainment (1.00)
- Media > Music (0.67)
- Technology:
- Information Technology
- Information Management > Search (1.00)
- Data Science > Data Mining (1.00)
- Artificial Intelligence
- Representation & Reasoning > Personal Assistant Systems (1.00)
- Cognitive Science (1.00)
- Natural Language
- Text Processing (1.00)
- Information Retrieval (1.00)
- Machine Learning
- Statistical Learning (1.00)
- Neural Networks > Deep Learning (1.00)
- Information Technology