Rethinking the Role of Token Retrieval in Multi-Vector Retrieval

Oct-11-2024, 00:29:00 GMT–Neural Information Processing Systems

Multi-vector retrieval models such as ColBERT [Khattab et al., 2020] allow token-level interactions between queries and documents, and hence achieve state of the art on many information retrieval benchmarks. However, their non-linear scoring function cannot be scaled to millions of documents, necessitating a three-stage process for inference: retrieving initial candidates via token retrieval, accessing all token vectors, and scoring the initial candidate documents. The non-linear scoring function is applied over all token vectors of each candidate document, making the inference process complicated and slow. In this paper, we aim to simplify the multi-vector retrieval by rethinking the role of token retrieval. We present XTR, ConteXtualized Token Retriever, which introduces a simple, yet novel, objective function that encourages the model to retrieve the most important document tokens first.

multi-vector retrieval, retrieval, token retrieval, (6 more...)

Neural Information Processing Systems

Oct-11-2024, 00:29:00 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence (0.41)