Unsupervised Domain Adaption for Neural Information Retrieval

Dominguez, Carlos, Campos, Jon Ander, Agirre, Eneko, Azkune, Gorka

arXiv.org Artificial Intelligence 

Neural information retrieval requires costly annotated data for each target domain to be competitive. Synthetic annotation by query generation using Large Language Models or rulebased string manipulation has been proposed as an alternative, but their relative merits have not been analysed. In this paper, we compare both methods head-to-head using the same neural IR architecture. We focus on the BEIR benchmark, which includes test datasets from several domains with no training data, and explore two scenarios: zero-shot, where the supervised system is trained in a large out-ofdomain dataset (MS-MARCO); and unsupervised Figure 1: Experimental design: (left) a supervised retriever domain adaptation, where, in addition to is trained with manual annotations from MS-MS-MARCO, the system is fine-tuned in synthetic MARCO; (middle) an unsupervised retriever is trained data from the target domain. Our results with automatically generated queries for MS-MARCO indicate that Large Language Models outperform documents; (right) an unsupervised domain adaptation rule-based methods in all scenarios by a retriever is trained with both MS-MARCO manual annotations large margin, and, more importantly, that unsupervised and automatically generated queries in-domain domain adaptation is effective compared BEIR dataset documents. Evaluation is performed in to applying a supervised IR system in a BEIR producing two scenarios: zero-shot (left and middle zero-shot fashion. In addition we explore several retrievers); unsupervised domain adaptation (right sizes of open Large Language Models to retriever).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found