Distillation for Multilingual Information Retrieval

Yang, Eugene, Lawrie, Dawn, Mayfield, James

May-1-2024–arXiv.org Artificial Intelligence

Recent work in cross-language information retrieval (CLIR), where queries and documents are in different languages, has shown the benefit of the Translate-Distill framework that trains a cross-language neural dual-encoder model using translation and distillation. However, Translate-Distill only supports a single document language. Multilingual information retrieval (MLIR), which ranks a multilingual document collection, is harder to train than CLIR because the model must assign comparable relevance scores to documents in different languages. This work extends Translate-Distill and propose Multilingual Translate-Distill (MTD) for MLIR. We show that ColBERT-X models trained with MTD outperform their counterparts trained ith Multilingual Translate-Train, which is the previous state-of-the-art training approach, by 5% to 25% in nDCG@20 and 15% to 45% in MAP. We also show that the model is robust to the way languages are mixed in training batches. Our implementation is available on GitHub.

information retrieval, proceedings, retrieval, (10 more...)

arXiv.org Artificial Intelligence

May-1-2024

arXiv.org PDF

Add feedback

Country:
- Asia
  - Middle East > Jordan (0.04)
  - Taiwan > Taiwan Province
    - Taipei (0.04)
- Europe
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - Norway > Western Norway
    - Rogaland > Stavanger (0.04)
  - Spain > Galicia
    - Madrid (0.04)
- North America
  - Canada > Ontario
    - Toronto (0.04)
  - Dominican Republic (0.04)
  - United States
    - District of Columbia > Washington (0.05)
    - Maryland > Baltimore (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - New York > New York County
      - New York City (0.04)
    - Washington > King County
      - Seattle (0.04)

Genre:
- Research Report (1.00)

Industry:
- Education (0.47)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.95)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found