Remining Hard Negatives for Generative Pseudo Labeled Domain Adaptation

Yuksel, Goksenin, Rau, David, Kamps, Jaap

Jan-24-2025–arXiv.org Artificial Intelligence

Dense retrievers have demonstrated significant potential for neural information retrieval; however, they exhibit a lack of robustness to domain shifts, thereby limiting their efficacy in zero-shot settings across diverse domains. A state-of-the-art domain adaptation technique is Generative Pseudo Labeling (GPL). GPL uses synthetic query generation and initially mined hard negatives to distill knowledge from cross-encoder to dense retrievers in the target domain. In this paper, we analyze the documents retrieved by the domain-adapted model and discover that these are more relevant to the target queries than those of the non-domain-adapted model. We then propose refreshing the hard-negative index during the knowledge distillation phase to mine better hard negatives. Our remining R-GPL approach boosts ranking performance in 13/14 BEIR datasets and 9/12 LoTTe datasets. Our contributions are (i) analyzing hard negatives returned by domain-adapted and non-domain-adapted models and (ii) applying the GPL training with and without hard-negative re-mining in LoTTE and BEIR datasets.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Jan-24-2025

arXiv.org PDF

Add feedback

Country:
- Asia > Singapore (0.04)
- Europe
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - Italy
    - Apulia > Bari (0.04)
    - Lazio > Rome (0.04)
  - Netherlands > North Holland
    - Amsterdam (0.05)
  - Spain (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
- North America > United States
  - New York > New York County
    - New York City (0.04)
  - Washington > King County
    - Seattle (0.04)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language > Large Language Model (0.89)
  - Representation & Reasoning (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found