UnifieR: A Unified Retriever for Large-Scale Retrieval
Shen, Tao, Geng, Xiubo, Tao, Chongyang, Xu, Can, Long, Guodong, Zhang, Kai, Jiang, Daxin
–arXiv.org Artificial Intelligence
Large-scale retrieval is to recall relevant documents from a huge collection given a query. It relies on representation learning to embed documents and queries into a common semantic encoding space. According to the encoding space, recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms. These two paradigms unveil the PLMs' representation capability in different granularities, i.e., global sequence-level compression and local word-level contexts, respectively. Inspired by their complementary global-local contextualization and distinct representing views, we propose a new learning framework, UnifieR which unifies dense-vector and lexicon-based retrieval in one model with a dual-representing capability. Experiments on passage retrieval benchmarks verify its effectiveness in both paradigms. A uni-retrieval scheme is further presented with even better retrieval quality. We lastly evaluate the model on BEIR benchmark to verify its transferability.
arXiv.org Artificial Intelligence
Jun-4-2023
- Country:
- Asia > Japan
- Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > United States
- New York > New York County > New York City (0.28)
- Asia > Japan
- Genre:
- Research Report (0.51)
- Industry:
- Technology: