Improving Document Retrieval Coherence for Semantically Equivalent Queries
Campese, Stefano, Moschitti, Alessandro, Lauriola, Ivano
–arXiv.org Artificial Intelligence
Dense Retrieval (DR) models have proven to be effective for Document Retrieval and Information Grounding tasks. Usually, these models are trained and optimized for improving the relevance of top-ranked documents for a given query. Previous work has shown that popular DR models are sensitive to the query and document lexicon: small variations of it may lead to a significant difference in the set of retrieved documents. In this paper, we propose a variation of the Multi-Negative Ranking loss for training DR that improves the coherence of models in retrieving the same documents with respect to semantically similar queries. The loss penalizes discrepancies between the top-k ranked documents retrieved for diverse but semantic equivalent queries. We conducted extensive experiments on various datasets, MS-MARCO, Natural Questions, BEIR, and TREC DL 19/20. The results show that (i) models optimizes by our loss are subject to lower sensitivity, and, (ii) interestingly, higher accuracy.
arXiv.org Artificial Intelligence
Aug-12-2025
- Country:
- Asia
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.14)
- Myanmar > Tanintharyi Region
- Dawei (0.04)
- Singapore (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- Middle East > UAE
- Europe > Ireland
- Leinster > County Dublin > Dublin (0.04)
- North America
- Mexico > Mexico City
- Mexico City (0.04)
- United States
- California > San Francisco County
- San Francisco (0.04)
- Florida > Miami-Dade County
- Miami (0.04)
- New Mexico > Bernalillo County
- Albuquerque (0.04)
- California > San Francisco County
- Mexico > Mexico City
- Asia
- Genre:
- Research Report > New Finding (1.00)
- Technology: