In Defense of Cross-Encoders for Zero-Shot Retrieval

Rosa, Guilherme, Bonifacio, Luiz, Jeronymo, Vitor, Abonizio, Hugo, Fadaee, Marzieh, Lotufo, Roberto, Nogueira, Rodrigo

Dec-12-2022–arXiv.org Artificial Intelligence

Bi-encoders and cross-encoders are widely used in many state-of-the-art retrieval pipelines. In this work we study the generalization ability of these two types of architectures on a wide range of parameter count on both in-domain and out-of-domain scenarios. We find that the number of parameters and early query-document interactions of cross-encoders play a significant role in the generalization ability of retrieval models. Our experiments show that increasing model size results in marginal gains on in-domain test sets, but much larger gains in new domains never seen during fine-tuning. Furthermore, we show that cross-encoders largely outperform bi-encoders of similar size in several tasks. In the BEIR benchmark, our largest cross-encoder surpasses a state-of-the-art bi-encoder by more than 4 average points. Finally, we show that using bi-encoders as first-stage retrievers provides no gains in comparison to a simpler retriever such as BM25 on out-of-domain tasks. The code is available at https://github.com/guilhermemr04/scaling-zero-shot-retrieval.git

arxiv preprint arxiv, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

Dec-12-2022

arXiv.org PDF

Add feedback

Country:
- South America > Brazil
  - São Paulo (0.04)
- North America
  - Dominican Republic (0.04)
  - United States
    - Washington > King County
      - Seattle (0.04)
    - New York > New York County
      - New York City (0.04)
    - Maryland > Montgomery County
      - Gaithersburg (0.04)
- Europe
  - Netherlands (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)

Genre:
- Research Report > New Finding (0.68)

Industry:
- Law (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language
    - Large Language Model (1.00)
    - Information Retrieval (0.71)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found