A judge in Northern California dealt a blow this week to a controversial campaign to make teachers more accountable for their students' level of achievement, the second key setback in recent months for those behind the effort. The ruling by Contra Costa County Superior Court Judge Barry Goode went against the Bay Area group Students Matter. The group's lawsuit aimed to force 13 school districts, including seven in Southern California, to make student standardized test scores a key part of teacher evaluations. Students Matter had hoped to build on a 2012 ruling against the Los Angeles Unified School District, which led to a settlement under which test scores were supposed to become part of teacher evaluations. But in Doe vs. Antioch, the case decided this week, the judge concluded that districts had broad discretion over how to use test results.
In this article, we describe a deployed educational technology application: the Criterion Online Essay Evaluation Service, a web-based system that provides automated scoring and evaluation of student essays. Criterion has two complementary applications: (1) CritiqueWriting Analysis Tools, a suite of programs that detect errors in grammar, usage, and mechanics, that identify discourse elements in the essay, and that recognize potentially undesirable elements of style, and (2) e-rater version 2.0, an automated essay scoring system. Critique and e-rater provide students with feedback that is specific to their writing in order to help them improve their writing skills and is intended to be used under the instruction of a classroom teacher. All of these capabilities outperform baseline algorithms, and some of the tools agree with human judges in their evaluations as often as two judges agree with each other.
We present a novel method for obtaining high-quality, domain-targeted multiple choice questions from crowd workers. Generating these questions can be difficult without trading away originality, relevance or diversity in the answer options. Our method addresses these problems by leveraging a large corpus of domain-specific text and a small set of existing questions. It produces model suggestions for document selection and answer distractor choice which aid the human question generation process. With this method we have assembled SciQ, a dataset of 13.7K multiple choice science exam questions (Dataset available at http://allenai.org/data.html). We demonstrate that the method produces in-domain questions by providing an analysis of this new dataset and by showing that humans cannot distinguish the crowdsourced questions from original questions. When using SciQ as additional training data to existing questions, we observe accuracy improvements on real science exams.
Roscoe, Rod (University of Memphis) | Varner, Laura (University of Memphis) | Cai, Zhiqiang (University of Memphis) | Weston, Jennifer (University of Memphis) | Crossley, Scott (Georgia State University) | McNamara, Danielle (University of Memphis)
Research on automated essay scoring (AES) indicates that computer-generated essay ratings are comparable to human ratings. However, despite investigations into the accuracy and reliability of AES scores, less attention has been paid to the feedback delivered to the students. This paper presents a method developers can use to quickly evaluate the usability of an automated feedback system prior to testing with students. Using this method, researchers evaluated the feedback provided by the Writing-Pal, an intelligent tutor for writing strategies. Lessons learned and potential for future research are discussed.