AITopics | matthij douze

Collaborating Authors

matthij douze

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Machine learning and high dimensional vector search

Douze, Matthijs

arXiv.org Artificial IntelligenceFeb-24-2025

Most high-dimensional vector search methods are based on st atistical tools, signal processing approaches or graph traversal algorithms. Statistical tools include random projections [15], dimensionality reduction (PCA and the SVD). Signal processing is employed p rimarily to compress vectors with quantization [30, 4, 22] Most recent indexing methods are rely on graphs [34, 49, 3, 11] that are built with graph traversal heuristics. Vector search (VS) is used in machine learning (ML) for train ing data deduplication [39] and searching ML embeddings [28, 5]. Therefore, there are many r esearch teams around the world that are competent in both fields.

arxiv preprint arxiv, matthij douze, quantization, (12 more...)

arXiv.org Artificial Intelligence

2502.16931

Country: Europe > Switzerland > Zürich > Zürich (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Lossless Compression of Vector IDs for Approximate Nearest Neighbor Search

Severo, Daniel, Ottaviano, Giuseppe, Muckley, Matthew, Ullrich, Karen, Douze, Matthijs

arXiv.org Artificial IntelligenceJan-16-2025

Approximate nearest neighbor search for vectors relies on indexes that are most often accessed from RAM. Therefore, storage is the factor limiting the size of the database that can be served from a machine. Lossy vector compression, i.e., embedding quantization, has been applied extensively to reduce the size of indexes. However, for inverted file and graph-based indices, auxiliary data such as vector ids and links (edges) can represent most of the storage cost. We introduce and evaluate lossless compression schemes for these cases. These approaches are based on asymmetric numeral systems or wavelet trees that exploit the fact that the ordering of ids is irrelevant within the data structures. In some settings, we are able to compress the vector ids by a factor 7, with no impact on accuracy or search runtime. On billion-scale datasets, this results in a reduction of 30% of the index size. Furthermore, we show that for some datasets, these methods can also compress the quantized vector codes losslessly, by exploiting sub-optimalities in the original quantization algorithm. The source code for our approach available at https://github.com/facebookresearch/vector_db_id_compression.

artificial intelligence, information retrieval, natural language, (17 more...)

arXiv.org Artificial Intelligence

2501.10479

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Massachusetts (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.62)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.61)

Add feedback

Residual Quantization with Implicit Neural Codebooks

Huijben, Iris, Douze, Matthijs, Muckley, Matthew, van Sloun, Ruud, Verbeek, Jakob

arXiv.org Artificial IntelligenceJan-26-2024

Vector quantization is a fundamental operation for data compression and vector search. To obtain high accuracy, multi-codebook methods increase the rate by representing each vector using codewords across multiple codebooks. Residual quantization (RQ) is one such method, which increases accuracy by iteratively quantizing the error of the previous step. The error distribution is dependent on previously selected codewords. This dependency is, however, not accounted for in conventional RQ as it uses a generic codebook per quantization step. In this paper, we propose QINCo, a neural RQ variant which predicts specialized codebooks per vector using a neural network that is conditioned on the approximation of the vector from previous steps. Experiments show that QINCo outperforms state-of-the-art methods by a large margin on several datasets and code sizes. For example, QINCo achieves better nearest-neighbor search accuracy using 12 bytes codes than other methods using 16 bytes on the BigANN and Deep1B dataset.

qinco, quantization, vector, (16 more...)

arXiv.org Artificial Intelligence

2401.14732

Country: Europe > Netherlands > North Brabant > Eindhoven (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback