Quinductor: a multilingual data-driven method for generating reading-comprehension questions using Universal Dependencies
Kalpakchi, Dmytro, Boye, Johan
–arXiv.org Artificial Intelligence
We propose a multilingual data-driven method for generating reading comprehension questions using dependency trees. Our method provides a strong, mostly deterministic, and inexpensive-totrain baseline for less-resourced languages. While a language-specific corpus is still required, its size is nowhere near those required by modern neural question generation (QG) architectures. Our method surpasses QG baselines previously reported in the literature and shows a good performance in terms of human evaluation. 1 Introduction We are interested in question generation (QG) - the task of automatically generating reading comprehension questions and their correct answers from given declarative sentences. Numerous methods have been proposed for solving this task, most of which have been aimed at the English language. Recent methods are based on neural networks and rely on the availability of large-scale datasets, such as SQuAD (Rajpurkar et al. 2016) - a question-answering dataset repurposed for QG - or large-scale pretrained models, such as GPT-3 (Brown et al. 2020). Early methods, mostly based on context-free grammars, relied on the strict word order and the limited inflectional morphology of English. These traits made it relatively straightforward to craft handwritten templates based on these grammars. The above mentioned idiosyncracies and the unique availability of large-scale resources for English leave a number of open challenges for developing QG methods applicable to languages other than English. The first challenge is the lack of large-scale training datasets, and a prohibitively high cost of obtaining such resources. State-of-the-art QG methods for English train their models on the previously mentioned SQuAD dataset, which contains more than 100,000 questions. Obtaining a good-quality dataset of a similar size is very expensive, especially for languages with fewer native speakers around the world. The second challenge is knowing how well available methods developed for English would generalize to other languages, especially synthetic ones with richer inflectional morphology and less strict word order (e.g., Finnish, Turkish or Russian). To the best of our knowledge, not much research has been done on QG for these kinds of languages. The third challenge is assessing the obtained performance results.
arXiv.org Artificial Intelligence
May-12-2023
- Country:
- South America > Brazil (0.04)
- North America > United States
- Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > Sweden
- Asia
- Middle East > Israel (0.04)
- China > Jiangsu Province
- Yancheng (0.04)
- Genre:
- Workflow (0.93)
- Research Report (0.63)
- Industry:
- Education > Assessment & Standards > Student Performance (1.00)
- Technology: