Large-scale Cloze Test Dataset Created by Teachers

Xie, Qizhe, Lai, Guokun, Dai, Zihang, Hovy, Eduard

Aug-27-2018–arXiv.org Artificial Intelligence

Cloze tests are widely adopted in language exams to evaluate students' language proficiency. In this paper, we propose the first large-scale human-created cloze test dataset CLOTH, containing questions used in middle-school and high-school language exams. With missing blanks carefully created by teachers and candidate choices purposely designed to be nuanced, CLOTH requires a deeper language understanding and a wider attention span than previously automatically-generated cloze datasets. We test the performance of dedicatedly designed baseline models including a language model trained on the One Billion Word Corpus and show humans outperform them by a significant margin. We investigate the source of the performance gap, trace model deficiencies to some distinct properties of CLOTH, and identify the limited ability of comprehending the long-term context to be the key bottleneck.

dataset, deep learning, language learning, (21 more...)

arXiv.org Artificial Intelligence

Aug-27-2018

arXiv.org PDF

Add feedback

Country:
- Asia (0.28)

Genre:
- Research Report (0.64)

Industry:
- Education
  - Curriculum > Subject-Specific Education (0.66)
  - Educational Setting > K-12 Education (0.56)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.50)
  - Natural Language (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found