DAPR: A Benchmark on Document-Aware Passage Retrieval
Wang, Kexin, Reimers, Nils, Gurevych, Iryna
–arXiv.org Artificial Intelligence
Recent neural retrieval mainly focuses on ranking short texts and is challenged with long documents. Existing work mainly evaluates either ranking passages or whole documents. However, there are many cases where the users want to find a relevant passage within a long document from a huge corpus, e.g. legal cases, research papers, etc. In this scenario, the passage often provides little document context and thus challenges the current approaches to finding the correct document and returning accurate results. To fill this gap, we propose and name this task Document-Aware Passage Retrieval (DAPR) and build a benchmark including multiple datasets from various domains, covering both DAPR and whole-document retrieval. In experiments, we extend the state-of-the-art neural passage retrievers with document-level context via different approaches including prepending document summary, pooling over passage representations, and hybrid retrieval with BM25. The hybrid-retrieval systems, the overall best, can only improve on the DAPR tasks marginally while significantly improving on the document-retrieval tasks. This motivates further research in developing better retrieval systems for the new task. The code and the data are available at https://github.com/kwang2049/dapr
arXiv.org Artificial Intelligence
May-23-2023
- Country:
- Africa > Ethiopia
- Addis Ababa > Addis Ababa (0.04)
- Asia
- Japan > Honshū
- Chūbu > Aichi Prefecture > Nagoya (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- Philippines > Mindanao
- Caraga > Province of Agusan del Sur (0.04)
- Japan > Honshū
- Europe
- North America
- Canada (0.29)
- United States
- California > Los Angeles County
- Long Beach (0.04)
- Maryland > Montgomery County
- Gaithersburg (0.04)
- Massachusetts > Suffolk County
- Boston (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- New York (0.04)
- Virginia > Richmond (0.04)
- Washington > King County
- Seattle (0.04)
- California > Los Angeles County
- South America > Chile
- Africa > Ethiopia
- Genre:
- Research Report (0.64)
- Industry:
- Government > Regional Government
- Health & Medicine (1.00)
- Law (1.00)
- Leisure & Entertainment (1.00)
- Media > Television (0.67)
- Technology: