Span Selection Pre-training for Question Answering

Glass, Michael, Gliozzo, Alfio, Chakravarti, Rishav, Ferritto, Anthony, Pan, Lin, Bhargav, G P Shrivatsa, Garg, Dinesh, Sil, Avirup

Sep-9-2019–arXiv.org Artificial Intelligence

BERT (Bidirectional Encoder Representations from Transformers) and related pre-trained Transformers have provided large gains across many language understanding tasks, achieving a new state-of-the-art (SOTA). BERT is pre-trained on two auxiliary tasks: Masked Language Model and Next Sentence Prediction. In this paper we introduce a new pre-training task inspired by reading comprehension and an effort to avoid encoding general knowledge in the transformer network itself. We find significant and consistent improvements over both BERT-BASE and BERT-LARGE on multiple reading comprehension (MRC) and paraphrasing datasets. Specifically, our proposed model has strong empirical evidence as it obtains SOTA results on Natural Questions, a new benchmark MRC dataset, outperforming BERT-LARGE by 3 F1 points on short answer prediction. We also establish a new SOTA in HotpotQA, improving answer prediction F1 by 4 F1 points and supporting fact prediction by 1 F1 point. Moreover, we show that our pre-training approach is particularly effective when training data is limited, improving the learning curve by a large amount.

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

Sep-9-2019

arXiv.org PDF

Add feedback

Country:
- Asia
  - India > Karnataka
    - Bengaluru (0.04)
  - Vietnam > Long An Province (0.04)
- Europe > Belgium (0.04)
- North America > United States
  - California
    - Los Angeles County > Los Angeles (0.04)
    - San Bernardino County > Redlands (0.04)
  - Florida > Brevard County
    - Merritt Island (0.04)
  - Ohio (0.04)

Genre:
- Overview (0.93)
- Research Report (0.65)

Industry:
- Education (0.70)
- Government > Regional Government
  - North America Government > United States Government (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.88)
  - Natural Language > Large Language Model (0.89)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found