Data-Efficient Autoregressive Document Retrieval for Fact Verification

Nov-17-2022–arXiv.org Artificial Intelligence

Document retrieval is a core component of many knowledge-intensive natural language processing task formulations such as fact verification and question answering. Sources of textual knowledge, such as Wikipedia articles, condition the generation of answers from the models. Recent advances in retrieval use sequence-to-sequence models to incrementally predict the title of the appropriate Wikipedia page given a query. However, this method requires supervision in the form of human annotation to label which Wikipedia pages contain appropriate context. This paper introduces a distant-supervision method that does not require any annotation to train autoregressive retrievers that attain competitive R-Precision and Recall in a zero-shot setting. Furthermore we show that with task-specific supervised fine-tuning, autoregressive retrieval performance for two Wikipedia-based fact verification tasks can approach or even exceed full supervision using less than $1/4$ of the annotated data indicating possible directions for data-efficient autoregressive retrieval.

machine learning, natural language, question answering, (17 more...)

arXiv.org Artificial Intelligence

Nov-17-2022

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - Victoria > Melbourne (0.04)
- North America
  - United States
    - Texas > Travis County
      - Austin (0.04)
    - New York > New York County
      - New York City (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
  - Canada
    - Ontario > Toronto (0.04)
    - British Columbia > Metro Vancouver Regional District
      - Vancouver (0.04)
- Europe
  - Italy > Trentino-Alto Adige/Südtirol
    - Trentino Province > Trento (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia > Middle East
  - Jordan (0.04)
  - Lebanon > Beirut Governorate
    - Beirut (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language > Question Answering (0.36)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found