Goto

Collaborating Authors

 essay quality


Context is Enough: Empirical Validation of $\textit{Sequentiality}$ on Essays

arXiv.org Artificial Intelligence

Recent work has proposed using Large Language Models (LLMs) to quantify narrative flow through a measure called sequentiality, which combines topic and contextual terms. A recent critique argued that the original results were confounded by how topics were selected for the topic-based component, and noted that the metric had not been validated against ground-truth measures of flow. That work proposed using only the contextual term as a more conceptually valid and interpretable alternative. In this paper, we empirically validate that proposal. Using two essay datasets with human-annotated trait scores, ASAP++ and ELLIPSE, we show that the contextual version of sequentiality aligns more closely with human assessments of discourse-level traits such as Organization and Cohesion. While zero-shot prompted LLMs predict trait scores more accurately than the contextual measure alone, the contextual measure adds more predictive value than both the topic-only and original sequentiality formulations when combined with standard linguistic features. Notably, this combination also outperforms the zero-shot LLM predictions, highlighting the value of explicitly modeling sentence-to-sentence flow. Our findings support the use of context-based sequentiality as a validated, interpretable, and complementary feature for automated essay scoring and related NLP tasks.


A School Student Essay Corpus for Analyzing Interactions of Argumentative Structure and Quality

arXiv.org Artificial Intelligence

Learning argumentative writing is challenging. Besides writing fundamentals such as syntax and grammar, learners must select and arrange argument components meaningfully to create high-quality essays. To support argumentative writing computationally, one step is to mine the argumentative structure. When combined with automatic essay scoring, interactions of the argumentative structure and quality scores can be exploited for comprehensive writing support. Although studies have shown the usefulness of using information about the argumentative structure for essay scoring, no argument mining corpus with ground-truth essay quality annotations has been published yet. Moreover, none of the existing corpora contain essays written by school students specifically. To fill this research gap, we present a German corpus of 1,320 essays from school students of two age groups. Each essay has been manually annotated for argumentative structure and quality on multiple levels of granularity. We propose baseline approaches to argument mining and essay scoring, and we analyze interactions between both tasks, thereby laying the ground for quality-oriented argumentative writing support.


Developing Component Scores from Natural Language Processing Tools to Assess Human Ratings of Essay Quality

AAAI Conferences

This study explores correlations between human ratings of essay quality and component scores based on similar natural language processing indices and weighted through a principal component analysis. The results demonstrate that such component scores show small to large effects with human ratings and thus may be suitable to providing both summative and formative feedback in an automatic writing evaluation systems such as those found in Writing-Pal.


Syntagmatic, Paradigmatic, and Automatic N-Gram Approaches to Assessing Essay Quality

AAAI Conferences

Computational indices related to n-gram production were developed in order to assess the potential for n-gram indices to predict human scores of essay quality. A regression analyses was conducted on a corpus of 313 argumentative essays. The analyses demonstrated that a variety of n-gram indices were highly correlated to essay quality, but were also highly correlated to the number of words in the text (although many of the n-gram indices were stronger predictors of writing quality than the number of words in a text). A second regression analysis was conducted on a corpus of 88 argumentative essays that were controlled for text length differences. This analysis demonstrated that n-gram indices were still strong predictors of essay quality when text length was not a factor.