Rapidly Bootstrapping a Question Answering Dataset for COVID-19

Tang, Raphael, Nogueira, Rodrigo, Zhang, Edwin, Gupta, Nikhil, Cam, Phuong, Cho, Kyunghyun, Lin, Jimmy

Apr-23-2020–arXiv.org Artificial Intelligence

We present CovidQA, the beginnings of a question answering dataset specifically designed for COVID-19, built by hand from knowledge gathered from Kaggle's COVID-19 Open Research Dataset Challenge. To our knowledge, this is the first publicly available resource of its type, and intended as a stopgap measure for guiding research until more substantial evaluation resources become available. While this dataset, comprising 124 question-article pairs as of the present version 0.1 release, does not have sufficient examples for supervised machine learning, we believe that it can be helpful for evaluating the zero-shot or transfer capabilities of existing models on topics specifically related to COVID-19. This paper describes our methodology for constructing the dataset and presents the effectiveness of a number of baselines, including term-based techniques and various transformer-based models. The dataset is available at http://covidqa.ai/

dataset, effectiveness, natural language question, (14 more...)

arXiv.org Artificial Intelligence

Apr-23-2020

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - New York (0.04)
    - Texas (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - Maryland > Montgomery County
      - Gaithersburg (0.05)
  - Canada > Nova Scotia
    - Halifax Regional Municipality > Halifax (0.04)
- Europe > Switzerland
  - Zürich > Zürich (0.14)
  - Geneva > Geneva (0.04)
- Asia
  - Japan > Honshū
    - Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
  - China
    - Zhejiang Province (0.04)
    - Hong Kong (0.04)
    - Beijing > Beijing (0.04)

Genre:
- Research Report (1.00)

Industry:
- Health & Medicine > Therapeutic Area
  - Infections and Infectious Diseases (1.00)
  - Immunology (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Question Answering (0.72)
    - Large Language Model (0.54)
    - Information Retrieval (0.47)
  - Machine Learning > Performance Analysis
    - Accuracy (0.40)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found