Resources and Evaluations for Multi-Distribution Dense Information Retrieval

Chatterjee, Soumya, Khattab, Omar, Arora, Simran

Jun-21-2023–arXiv.org Artificial Intelligence

We introduce and define the novel problem of multi-distribution information retrieval (IR) where given a query, systems need to retrieve passages from within multiple collections, each drawn from a different distribution. Some of these collections and distributions might not be available at training time. To evaluate methods for multi-distribution retrieval, we design three benchmarks for this task from existing single-distribution datasets, namely, a dataset based on question answering and two based on entity matching. We propose simple methods for this task which allocate the fixed retrieval budget (top-k passages) strategically across domains to prevent the known domains from consuming most of the budget. We show that our methods lead to an average of 3.8+ and up to 8.0 points improvements in Recall@100 across the datasets and that improvements are consistent when fine-tuning different base retrieval models. Our benchmarks are made publicly available.

information retrieval, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Jun-21-2023

arXiv.org PDF

Add feedback

Country:
- Europe > Isle of Man (0.04)
- North America
  - United States
    - Texas > Travis County
      - Austin (0.04)
    - New York > New York County
      - New York City (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - California > Santa Clara County
      - Stanford (0.04)
      - Palo Alto (0.04)
  - Canada
    - Quebec > Montreal (0.04)
    - British Columbia > Metro Vancouver Regional District
      - Vancouver (0.04)
- Asia
  - Middle East > Jordan (0.04)
  - Taiwan > Taiwan Province
    - Taipei (0.05)

Genre:
- Research Report (0.50)

Industry:
- Retail (0.49)
- Media (0.46)
- Leisure & Entertainment (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language > Information Retrieval (0.85)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found