COVIDRead: A Large-scale Question Answering Dataset on COVID-19
Saikh, Tanik, Sahoo, Sovan Kumar, Ekbal, Asif, Bhattacharyya, Pushpak
–arXiv.org Artificial Intelligence
During this pandemic situation, extracting any relevant information related to COVID-19 will be immensely beneficial to the community at large. In this paper, we present a very important resource, COVIDRead, a Stanford Question Answering Dataset (SQuAD) like dataset over more than 100k question-answer pairs. The dataset consists of Context-Answer-Question triples. Primarily the questions from the context are constructed in an automated way. After that, the system-generated questions are manually checked by hu-mans annotators. This is a precious resource that could serve many purposes, ranging from common people queries regarding this very uncommon disease to managing articles by editors/associate editors of a journal. We establish several end-to-end neural network based baseline models that attain the lowest F1 of 32.03% and the highest F1 of 37.19%. To the best of our knowledge, we are the first to provide this kind of QA dataset in such a large volume on COVID-19. This dataset creates a new avenue of carrying out research on COVID-19 by providing a benchmark dataset and a baseline model.
arXiv.org Artificial Intelligence
Oct-5-2021
- Country:
- Oceania > Australia
- North America
- United States
- Ohio (0.04)
- Texas > Travis County
- Austin (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Michigan > Washtenaw County
- Ann Arbor (0.04)
- Canada > British Columbia
- United States
- Europe
- United Kingdom > England
- Hampshire > Southampton (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Netherlands > North Holland
- Amsterdam (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- United Kingdom > England
- Asia
- India
- Maharashtra > Mumbai (0.04)
- Bihar > Patna (0.04)
- China > Hubei Province
- Wuhan (0.04)
- India
- Genre:
- Research Report (0.64)
- Industry:
- Technology: