Learning to Tokenize for Generative Retrieval Weiwei Sun
–Neural Information Processing Systems
As a new paradigm in information retrieval, generative retrieval directly generates a ranked list of document identifiers (docids) for a given query using generative language models (LMs). How to assign each document a unique docid (denoted as document tokenization) is a critical problem, because it determines whether the generative retrieval model can precisely retrieve any document by simply decoding its docid. Most existing methods adopt rule-based tokenization, which is ad-hoc and does not generalize well.
Neural Information Processing Systems
Feb-11-2025, 04:54:31 GMT
- Country:
- Asia (0.28)
- Europe > Netherlands (0.46)
- Genre:
- Research Report > New Finding (0.68)
- Industry:
- Government > Regional Government (0.46)
- Technology: