COVIDRead: A Large-scale Question Answering Dataset on COVID-19

Saikh, Tanik, Sahoo, Sovan Kumar, Ekbal, Asif, Bhattacharyya, Pushpak

Oct-5-2021–arXiv.org Artificial Intelligence

During this pandemic situation, extracting any relevant information related to COVID-19 will be immensely beneficial to the community at large. In this paper, we present a very important resource, COVIDRead, a Stanford Question Answering Dataset (SQuAD) like dataset over more than 100k question-answer pairs. The dataset consists of Context-Answer-Question triples. Primarily the questions from the context are constructed in an automated way. After that, the system-generated questions are manually checked by hu-mans annotators. This is a precious resource that could serve many purposes, ranging from common people queries regarding this very uncommon disease to managing articles by editors/associate editors of a journal. We establish several end-to-end neural network based baseline models that attain the lowest F1 of 32.03% and the highest F1 of 37.19%. To the best of our knowledge, we are the first to provide this kind of QA dataset in such a large volume on COVID-19. This dataset creates a new avenue of carrying out research on COVID-19 by providing a benchmark dataset and a baseline model.

computational linguistic, covid-19, dataset, (14 more...)

arXiv.org Artificial Intelligence

Oct-5-2021

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - Victoria > Melbourne (0.04)
- North America
  - United States
    - Ohio (0.04)
    - Texas > Travis County
      - Austin (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - Michigan > Washtenaw County
      - Ann Arbor (0.04)
  - Canada > British Columbia
    - Metro Vancouver Regional District > Vancouver (0.04)
- Europe
  - United Kingdom > England
    - Hampshire > Southampton (0.04)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Netherlands > North Holland
    - Amsterdam (0.04)
  - Denmark > Capital Region
    - Copenhagen (0.04)
- Asia
  - India
    - Maharashtra > Mumbai (0.04)
    - Bihar > Patna (0.04)
  - China > Hubei Province
    - Wuhan (0.04)

Genre:
- Research Report (0.64)

Industry:
- Health & Medicine
  - Epidemiology (1.00)
  - Therapeutic Area
    - Infections and Infectious Diseases (1.00)
    - Immunology (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Question Answering (0.85)
  - Machine Learning > Neural Networks
    - Deep Learning (0.94)