Goto

Collaborating Authors

 Briscoe, Ted


Enhancing Arabic Automated Essay Scoring with Synthetic Data and Error Injection

arXiv.org Artificial Intelligence

Automated Essay Scoring (AES) plays a crucial role in assessing language learners' writing quality, reducing grading workload, and providing real-time feedback. Arabic AES systems are particularly challenged by the lack of annotated essay datasets. This paper presents a novel framework leveraging Large Language Models (LLMs) and Transformers to generate synthetic Arabic essay datasets for AES. We prompt an LLM to generate essays across CEFR proficiency levels and introduce controlled error injection using a fine-tuned Standard Arabic BERT model for error type prediction. Our approach produces realistic human-like essays, contributing a dataset of 3,040 annotated essays. Additionally, we develop a BERT-based auto-marking system for accurate and scalable Arabic essay evaluation. Experimental results demonstrate the effectiveness of our framework in improving Arabic AES performance.


PeerQA: A Scientific Question Answering Dataset from Peer Reviews

arXiv.org Artificial Intelligence

We present PeerQA, a real-world, scientific, document-level Question Answering (QA) dataset. PeerQA questions have been sourced from peer reviews, which contain questions that reviewers raised while thoroughly examining the scientific article. Answers have been annotated by the original authors of each paper. The dataset contains 579 QA pairs from 208 academic articles, with a majority from ML and NLP, as well as a subset of other scientific communities like Geoscience and Public Health. PeerQA supports three critical tasks for developing practical QA systems: Evidence retrieval, unanswerable question classification, and answer generation. We provide a detailed analysis of the collected dataset and conduct experiments establishing baseline systems for all three tasks. Our experiments and analyses reveal the need for decontextualization in document-level retrieval, where we find that even simple decontextualization approaches consistently improve retrieval performance across architectures. On answer generation, PeerQA serves as a challenging benchmark for long-context modeling, as the papers have an average size of 12k tokens. Our code and data is available at https://github.com/UKPLab/peerqa.


Emergent Word Order Universals from Cognitively-Motivated Language Models

arXiv.org Artificial Intelligence

The world's languages exhibit certain so-called typological or implicational universals; for example, Subject-Object-Verb (SOV) languages typically use postpositions. Explaining the source of such biases is a key goal of linguistics. We study word-order universals through a computational simulation with language models (LMs). Our experiments show that typologically-typical word orders tend to have lower perplexity estimated by LMs with cognitively plausible biases: syntactic biases, specific parsing strategies, and memory limitations. This suggests that the interplay of cognitive biases and predictability (perplexity) can explain many aspects of word-order universals. It also showcases the advantage of cognitively-motivated LMs, typically employed in cognitive modeling, in the simulation of language universals.


Grammatical Error Correction: A Survey of the State of the Art

arXiv.org Artificial Intelligence

Grammatical Error Correction (GEC) is the task of automatically detecting and correcting errors in text. The task not only includes the correction of grammatical errors, such as missing prepositions and mismatched subject-verb agreement, but also orthographic and semantic errors, such as misspellings and word choice errors respectively. The field has seen significant progress in the last decade, motivated in part by a series of five shared tasks, which drove the development of rule-based methods, statistical classifiers, statistical machine translation, and finally neural machine translation systems which represent the current dominant state of the art. In this survey paper, we condense the field into a single article and first outline some of the linguistic challenges of the task, introduce the most popular datasets that are available to researchers (for both English and other languages), and summarise the various methods and techniques that have been developed with a particular focus on artificial error generation. We next describe the many different approaches to evaluation as well as concerns surrounding metric reliability, especially in relation to subjective human judgements, before concluding with an overview of recent progress and suggestions for future work and remaining challenges. We hope that this survey will serve as comprehensive resource for researchers who are new to the field or who want to be kept apprised of recent developments.


Analyzing Neural Discourse Coherence Models

arXiv.org Artificial Intelligence

Different theories have been proposed model's ability to rank a well-organized document to describe the properties that contribute to higher than its noisy counterparts created by discourse coherence and some have been integrated corrupting sentence order in the original document with computational models for empirical (binary discrimination task), and neural evaluation. A popular approach is the entitybased models have achieved remarkable accuracy on model which hypothesizes that coherence this task. Recent efforts have targeted additional can be assessed in terms of the distribution of tasks such as recovering the correct sentence and transitions between entities in a text - by order (Logeswaran et al., 2018; Cui et al., 2018), constructing an entity-grid (Egrid) representation evaluating on realistic data (Lai and Tetreault, (Barzilay and Lapata, 2005, 2008), building 2018; Farag and Yannakoudakis, 2019) and on Centering Theory (Grosz et al., 1995). Subsequent focusing on open-domain models of coherence work has adapted and further extended (Li and Jurafsky, 2017; Xu et al., 2019). Egrid representations (Filippova and Strube, However, less attention has been directed to 2007; Burstein et al., 2010; Elsner and Charniak, investigating and analyzing the properties of coherence 2011; Guinaudeau and Strube, 2013). Other that current models can capture, nor what research has focused on syntactic patterns knowledge is encoded in their representations and that cooccur in text (Louis and Nenkova, how it might relate to aspects of coherence.


Neural Automated Essay Scoring and Coherence Modeling for Adversarially Crafted Input

arXiv.org Artificial Intelligence

We demonstrate that current state-of-the-art approaches to Automated Essay Scoring (AES) are not well-suited to capturing adversarially crafted input of grammatical but incoherent sequences of sentences. We develop a neural model of local coherence that can effectively learn connectedness features between sentences, and propose a framework for integrating and jointly training the local coherence model with a state-of-the-art AES model. We evaluate our approach against a number of baselines and experimentally demonstrate its effectiveness on both the AES task and the task of flagging adversarial input, further contributing to the development of an approach that strengthens the validity of neural essay scoring models.