Bottleneck-Minimal Indexing for Generative Document Retrieval
Du, Xin, Xiu, Lixin, Tanaka-Ishii, Kumiko
–arXiv.org Artificial Intelligence
We apply an information-theoretic perspective to reconsider generative document retrieval (GDR), in which a document $x \in X$ is indexed by $t \in T$, and a neural autoregressive model is trained to map queries $Q$ to $T$. GDR can be considered to involve information transmission from documents $X$ to queries $Q$, with the requirement to transmit more bits via the indexes $T$. By applying Shannon's rate-distortion theory, the optimality of indexing can be analyzed in terms of the mutual information, and the design of the indexes $T$ can then be regarded as a {\em bottleneck} in GDR. After reformulating GDR from this perspective, we empirically quantify the bottleneck underlying GDR. Finally, using the NQ320K and MARCO datasets, we evaluate our proposed bottleneck-minimal indexing method in comparison with various previous indexing methods, and we show that it outperforms those methods.
arXiv.org Artificial Intelligence
May-20-2024
- Country:
- North America > United States
- New York > New York County
- New York City (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.04)
- New York > New York County
- Europe
- Asia
- Singapore (0.04)
- Indonesia > Bali (0.04)
- Middle East > Israel
- Jerusalem District > Jerusalem (0.04)
- Japan > Honshū
- Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- North America > United States
- Genre:
- Research Report (0.82)
- Technology: