A Comprehensive Survey on Multi-hop Machine Reading Comprehension Datasets and Metrics
Mohammadi, Azade, Ramezani, Reza, Baraani, Ahmad
–arXiv.org Artificial Intelligence
Abstract: Multi-hop Machine reading comprehension is a challenging task with aim of answering a question based on disjoint pieces of information across the different passages. The evaluation metrics and datasets are a vital part of multi-hop MRC because it is not possible to train and evaluate models without them, also, the proposed challenges by datasets often are an important motivation for improving the existing models. Due to increasing attention to this field, it is necessary and worth reviewing them in detail. This study aims to present a comprehensive survey on recent advances in multi-hop MRC evaluation metrics and datasets. In this regard, first, the multi-hop MRC problem definition will be presented, then the evaluation metrics based on their multi-hop aspect will be investigated. Also, 15 multi-hop datasets have been reviewed in detail from 2017 to 2022, and a comprehensive analysis has been prepared at the end. Finally, open issues in this field have been discussed. Keywords: Multi-hop Machine Reading Comprehension, Multi-hop Machine Reading Comprehension Dataset, Natural Language Processing, 1-INTRODUCTION Machine reading comprehension (MRC) is one of the most important and long-standing topics in Natural Language Processing (NLP). MRC provides a way to evaluate an NLP system's capability for natural language understanding. An MRC task, in brief, refers to the ability of a computer to read and understand natural language context and then find the answer to questions about that context. The emergence of large-scale single-document MRC datasets, such as SQuAD (Rajpurkar et al., 2016), CNN/Daily mail (Hermann et al., 2015), has led to increased attention to this topic and different models have been proposed to address the MRC problem, such as (D. However, for many of these datasets, it has been found that models don't need to comprehend and reason to answer a question. For example, Khashabi et al (Khashabi et al., 2016) proved that adversarial perturbation in candidate answers has a negative effect on the performance of the QA systems. Similarly, (Jia & Liang, 2017) showed that adding an adversarial sentence to the SQuAD (Rajpurkar et al., 2016) context will drop the result of many existing models.
arXiv.org Artificial Intelligence
Dec-7-2022
- Country:
- Africa
- Middle East > Somalia (0.04)
- Namibia (0.14)
- Asia
- India > Maharashtra
- Mumbai (0.04)
- Middle East > Iran
- Isfahan Province > Isfahan (0.04)
- Pakistan (0.04)
- India > Maharashtra
- Europe
- Austria > Vienna (0.04)
- Portugal (0.04)
- United Kingdom
- England (0.04)
- Northern Ireland (0.04)
- Indian Ocean > Arabian Sea (0.04)
- North America
- Aruba (0.04)
- Saint Martin (0.04)
- Sint Maarten > Philipsburg (0.04)
- United States > California (0.04)
- South America > Brazil (0.04)
- Africa
- Genre:
- Overview (1.00)
- Industry:
- Education > Assessment & Standards > Student Performance (1.00)
- Technology: