Denoising Table-Text Retrieval for Open-Domain Question Answering
Kang, Deokhyung, Jung, Baikjin, Kim, Yunsu, Lee, Gary Geunbae
–arXiv.org Artificial Intelligence
In table-text open-domain question answering, a retriever system retrieves relevant evidence from tables and text to answer questions. Previous studies in table-text open-domain question answering have two common challenges: firstly, their retrievers can be affected by false-positive labels in training datasets; secondly, they may struggle to provide appropriate evidence for questions that require reasoning across the table. To address these issues, we propose Denoised Table-Text Retriever (DoTTeR). Our approach involves utilizing a denoised training dataset with fewer false positive labels by discarding instances with lower question-relevance scores measured through a false positive detection model. Subsequently, we integrate table-level ranking information into the retriever to assist in finding evidence for questions that demand reasoning across the table. To encode this ranking information, we fine-tune a rank-aware column encoder to identify minimum and maximum values within a column. Experimental results demonstrate that DoTTeR significantly outperforms strong baselines on both retrieval recall and downstream QA tasks. Our code is available at https://github.com/deokhk/DoTTeR.
arXiv.org Artificial Intelligence
Mar-26-2024
- Country:
- South America > Chile
- North America
- United States
- Minnesota > Hennepin County
- Minneapolis (0.04)
- California > Santa Clara County
- Los Gatos (0.04)
- Minnesota > Hennepin County
- Canada > Ontario
- Toronto (0.04)
- United States
- Europe
- Asia
- Japan (0.04)
- China > Hong Kong (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- Genre:
- Research Report (0.70)
- Technology: