Generative Dense Retrieval: Memory Can Be a Burden

Yuan, Peiwen, Wang, Xinglin, Feng, Shaoxiong, Pan, Boyuan, Li, Yiwei, Wang, Heda, Miao, Xupeng, Li, Kan

Jan-18-2024–arXiv.org Artificial Intelligence

Generative Retrieval (GR), autoregressively decoding relevant document identifiers given a query, has been shown to perform well under the setting of small-scale corpora. By memorizing the document corpus with model parameters, GR implicitly achieves deep interaction between query and document. However, such a memorizing mechanism faces three drawbacks: (1) Poor memory accuracy for fine-grained features of documents; (2) Memory confusion gets worse as the corpus size increases; (3) Huge memory update costs for new documents. To alleviate these problems, we propose the Generative Dense Retrieval (GDR) paradigm. Specifically, GDR first uses the limited memory volume to achieve inter-cluster matching from query to relevant document clusters. Memorizing-free matching mechanism from Dense Retrieval (DR) is then introduced to conduct fine-grained intra-cluster matching from clusters to relevant documents. The coarse-to-fine process maximizes the advantages of GR's deep interaction and DR's scalability. Besides, we design a cluster identifier constructing strategy to facilitate corpus memory and a cluster-adaptive negative sampling strategy to enhance the intra-cluster mapping ability. Empirical results show that GDR obtains an average of 3.0 R@100 improvement on NQ dataset under multiple settings and has better scalability.

gdr, identifier, retrieval, (15 more...)

arXiv.org Artificial Intelligence

Jan-18-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - Dominican Republic (0.04)
  - United States
    - Pennsylvania
      - Philadelphia County > Philadelphia (0.04)
      - Allegheny County > Pittsburgh (0.04)
    - New York > New York County
      - New York City (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - California > San Francisco County
      - San Francisco (0.04)
- Europe
  - Austria (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
- Asia
  - Myanmar > Tanintharyi Region
    - Dawei (0.04)
  - Japan > Honshū
    - Kantō > Tokyo Metropolis Prefecture > Tokyo (0.28)
  - China > Beijing
    - Beijing (0.04)

Genre:
- Research Report > New Finding (0.88)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Information Retrieval (1.00)
  - Machine Learning (1.00)