Developing Pedagogically-Guided Threshold Algorithms for Intelligent Automated Essay Feedback

AAAI Conferences

Current computer-based tools for writing instruction show high scoring accuracy but uncertain instructional efficacy. One explanation is that these systems may not effectively communicate valid and appropriate formative feedback. In this paper, we describe an exploratory method for developing feedback algorithms that are grounded in writing pedagogy. The resulting threshold algorithms are shown to be meaningfully related to essay quality and informative regarding individualized, formative feedback for writers.


Internal Usability Testing of Automated Essay Feedback in an Intelligent Writing Tutor

AAAI Conferences

Research on automated essay scoring (AES) indicates that computer-generated essay ratings are comparable to human ratings. However, despite investigations into the accuracy and reliability of AES scores, less attention has been paid to the feedback delivered to the students. This paper presents a method developers can use to quickly evaluate the usability of an automated feedback system prior to testing with students. Using this method, researchers evaluated the feedback provided by the Writing-Pal, an intelligent tutor for writing strategies. Lessons learned and potential for future research are discussed.


1672

AI Magazine

In this article, we describe a deployed educational technology application: the Criterion Online Essay Evaluation Service, a web-based system that provides automated scoring and evaluation of student essays. Criterion has two complementary applications: (1) Critique Writing Analysis Tools, a suite of programs that detect errors in grammar, usage, and mechanics, that identify discourse elements in the essay, and that recognize potentially undesirable elements of style, and (2) e-rater version 2.0, an automated essay scoring system. Critique and e-rater provide students with feedback that is specific to their writing in order to help them improve their writing skills and is intended to be used under the instruction of a classroom teacher. Both applications employ natural language processing and machine learning techniques. All of these capabilities outperform baseline algorithms, and some of the tools agree with human judges in their evaluations as often as two judges agree with each other. Unfortunately, this puts an enormous load on the classroom teacher, who is faced with reading and providing feedback for perhaps 30 essays or more every time a topic is assigned. As a result, teachers are not able to give writing assignments as often as they would wish. With this in mind, researchers have sought to develop applications that automate essay scoring and evaluation. Work in automated essay scoring began in the early 1960s and has been extremely productive (Page 1966; Burstein et al. 1998; Foltz, Kintsch, and Landauer 1998; Larkey 1998; Rudner 2002; Elliott 2003). Detailed descriptions of most of these systems appear in Shermis and Burstein (2003). Pioneering work in the related area of automated feedback was initiated in the 1980s with the Writer's Workbench (MacDonald et al. 1982). The Criterion Online Essay Evaluation Service combines automated essay scoring and diagnostic feedback. The feedback is specific to the student's essay and is based on the kinds of evaluations that teachers typically provide when grading a student's writing. Criterion is intended to be an aid, not a replacement, for classroom instruction. Its purpose is to ease the instructor's load, thereby enabling the instructor to give students more practice writing essays. Criterion contains two complementary applications that are based on natural language processing (NLP) methods. Critique is an application that is comprised of a suite of programs that evaluate and provide feedback for errors in grammar, usage, and mechanics, that identify the essay's discourse structure, and that recognize potentially undesirable stylistic features. The companion scoring application, e-rater version 2.0, extracts linguistically-based features from an essay and uses a statistical model of how these features are related to overall writing quality to assign a holistic score to the essay. Figure 1 shows Criterion's interface for submit-


Automated Essay Evaluation: The Criterion Online Writing Service

AI Magazine

In this article, we describe a deployed educational technology application: the Criterion Online Essay Evaluation Service, a web-based system that provides automated scoring and evaluation of student essays. Criterion has two complementary applications: (1) CritiqueWriting Analysis Tools, a suite of programs that detect errors in grammar, usage, and mechanics, that identify discourse elements in the essay, and that recognize potentially undesirable elements of style, and (2) e-rater version 2.0, an automated essay scoring system. Critique and e-rater provide students with feedback that is specific to their writing in order to help them improve their writing skills and is intended to be used under the instruction of a classroom teacher. Both applications employ natural language processing and machine learning techniques. All of these capabilities outperform baseline algorithms, and some of the tools agree with human judges in their evaluations as often as two judges agree with each other.


Writing Quality, Knowledge, and Comprehension Correlates of Human and Automated Essay Scoring

AAAI Conferences

Automated essay scoring tools are often criticized on the basis of construct validity. Specifically, it has been argued that computational scoring algorithms may be unaligned to higher-level indicators of quality writing, such as writers’ demonstrated knowledge and understanding of the essay topics. In this paper, we consider how and whether the scoring algorithms within an intelligent writing tutor correlate with measures of writing proficiency and students’ general knowledge, reading comprehension, and vocabulary skill. Results indicate that the computational algorithms, although less attuned to knowledge and comprehension factors than human raters, were marginally related to such variables. Implications for improving automated scoring and intelligent tutoring of writing are briefly discussed.