Roscoe, Rod D. (Arizona State University) | Crossley, Scott A. (Georgia State University) | Snow, Erica L. (Arizona State University) | Varner, Laura K. (Arizona State University) | McNamara, Danielle S. (Arizona State University)
Automated essay scoring tools are often criticized on the basis of construct validity. Specifically, it has been argued that computational scoring algorithms may be unaligned to higher-level indicators of quality writing, such as writers’ demonstrated knowledge and understanding of the essay topics. In this paper, we consider how and whether the scoring algorithms within an intelligent writing tutor correlate with measures of writing proficiency and students’ general knowledge, reading comprehension, and vocabulary skill. Results indicate that the computational algorithms, although less attuned to knowledge and comprehension factors than human raters, were marginally related to such variables. Implications for improving automated scoring and intelligent tutoring of writing are briefly discussed.
Automated essay scoring (AES) is a broadly used application of machine learning, with a long history of real-world use that impacts high-stakes decision-making for students. However, defensibility arguments in this space have typically been rooted in hand-crafted features and psychometrics research, which are a poor fit for recent advances in AI research and more formative classroom use of the technology. This paper proposes a framework for evaluating automated essay scoring models trained with more modern algorithms, used in a classroom setting; that framework is then applied to evaluate an existing product, Turnitin Revision Assistant.
Alur, Rajeev (University of Pennsylvania) | D'Antoni, Loris (University of Pennsylvania) | Gulwani, Sumit (Microsoft Research) | Kini, Dileep (University of Illinois at Urbana-Champaign) | Viswanathan, Mahesh (Univerisity of Illinois at Urbana-Champaign)
One challenge in making online education more effective is to develop automatic grading software that can provide meaningful feedback. This paper provides a solution to automatic grading of the standard computation-theory problem that asks a student to construct a deterministic finite automaton (DFA) from the given description of its language. We focus on how to assign partial grades for incorrect answers. Each student's answer is compared to the correct DFA using a hybrid of three techniques devised to capture different classes of errors. First, in an attempt to catch syntactic mistakes, we compute the edit distance between the two DFA descriptions. Second, we consider the entropy of the symmetric difference of the languages of the two DFAs, and compute a score that estimates the fraction of the number of strings on which the student answer is wrong. Our third technique is aimed at capturing mistakes in reading of the problem description. For this purpose, we consider a description language Mosel, which adds syntactic sugar to the classical Monadic Second Order Logic, and allows defining regular languages in a concise and natural way. We provide algorithms, along with optimizations, for transforming Mosel descriptions into DFAs and vice-versa. These allow us to compute the syntactic edit distance of the incorrect answer from the correct one in terms of their logical representations. We report an experimental study that evaluates hundreds of answers submitted by (real) students by comparing grades/feedback computed by our tool with human graders. Our conclusion is that the tool is able to assign partial grades in a meaningful way, and should be preferred over the human graders for both scalability and consistency.
In this article, we describe a deployed educational technology application: the Criterion Online Essay Evaluation Service, a web-based system that provides automated scoring and evaluation of student essays. Criterion has two complementary applications: (1) CritiqueWriting Analysis Tools, a suite of programs that detect errors in grammar, usage, and mechanics, that identify discourse elements in the essay, and that recognize potentially undesirable elements of style, and (2) e-rater version 2.0, an automated essay scoring system. Critique and e-rater provide students with feedback that is specific to their writing in order to help them improve their writing skills and is intended to be used under the instruction of a classroom teacher. All of these capabilities outperform baseline algorithms, and some of the tools agree with human judges in their evaluations as often as two judges agree with each other.
Databricks helps hundreds of organizations use Apache Spark to answer important questions by analyzing data. Apache Spark is an open-source data processing engine for engineers and analysts that includes an optimized general execution runtime and a set of standard libraries for building data pipelines, advanced algorithms, and more. Besides contributing over 75% of Apache Spark's design and development work, Databricks has also trained a large number of Spark developers. In less than three years over 30,000 students have completed a Databricks training course about Apache Spark and over half of those students received the training for free. Last year Databricks worked with professors from the University of California Berkeley and University of California Los Angeles to produce two free Massive Open Online Courses (MOOCs).