NLP and Education: using semantic similarity to evaluate filled gaps in a large-scale Cloze test in the classroom

de Gois, Túlio Sousa, Freitas, Flávia Oliveira, Tejada, Julian, Freitag, Raquel Meister Ko.

arXiv.org Artificial Intelligence 

Since half past the last century, the Cloze test has been used for educational purposes to assess proficiency in understanding texts in different languages Taylor [1953], Brown [1980, 2002]. The task consists of the systematic filling in of gaps in a text, specifically a prose selection Bickley et al. [1970], previously adapted to the participant's realities, and the scores of correct answers are associated with the degree of comprehension of the text by the participant. Different measures, such as exact answer, acceptable answer Brown [1980], multiple choice, and Clozentropy Darnell [1968], Lowry and Marr [1975], have been used to assess gap-filling since Taylor's initial proposal Taylor [1953]. These measures will be further examined in Section 2. The exact answer may seem easier to calculate, especially for a Cloze test applied to large and heterogeneous groups of students with insufficient time for teachers to analyze each answer individually. In Brazil, for instance, teachers usually have to manage numerous classes, and this correction method helps to provide rapid answers to students' reading proficiency, allowing one to check the answers objectively Cunha and Santos [2010] without possible or different options.