Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback

Yu, HongChien, Xiong, Chenyan, Callan, Jamie

arXiv.org Artificial Intelligence 

Retrieval with dense, fully-learned representations has the potential to address some fundamental challenges in sparse retrieval. Dense retrieval systems conduct first-stage retrieval using embedded For example, vocabulary mismatch can be solved if the embeddings representations and simple similarity metrics to match a query accurately capture the information need behind a query and to documents. Its effectiveness depends on encoded embeddings maps it to relevant documents. However, decades of IR research to capture the semantics of queries and documents, a challenging demonstrates that inferring a user's search intent from a concise task due to the shortness and ambiguity of search queries. This and often ambiguous search query is challenging [7]. Even with paper proposes ANCE-PRF, a new query encoder that uses pseudo powerful pre-trained language models, it is unrealistic to expect an relevance feedback (PRF) to improve query representations for encoder to perfectly embed the underlying information need from dense retrieval. ANCE-PRF uses a BERT encoder that consumes a few query terms.