Computerized cross-language plagiarism detection has recently become essential. With the scarcity of scientific publications in Bahasa Indonesia, many Indonesian authors frequently consult publications in English in order to boost the quantity of scientific publications in Bahasa Indonesia (which is currently rising). Due to the syntax disparity between Bahasa Indonesia and English, most of the existing methods for automated cross-language plagiarism detection do not provide satisfactory results. The results of the experiments showed that the best accuracy achieved is 87% with a document size of 6 words, and the document definition size must be kept below 10 words in order to maintain high accuracy.
Anthony Goldbloom is cofounder and CEO of Kaggle, a platform for machine-learning competitions. Almost 500,000 of the world's top data scientists compete on Kaggle to solve important problems for industry, government, and academia. Kaggle has catalyzed breakthroughs in areas ranging from automated essay grading to automated disease diagnosis from medical images. Before cofounding Kaggle in 2010, Anthony was an econometrician at the Australian treasury.
You can search Google for pictures similar to a given image, for plagiarism detection or to find people that look like you. Google correctly figured out that figure 1 represents Vincent Granville, and indeed the first picture returned by Google is the one from figure 1. But Google displayed search results (images) related to Vincent Granville, as opposed to pictures similar to the one that I uploaded: that is, images associated with my blog posts on Data Science Central, or images corresponding to another guy who shares the same name Vincent Granville (and a friend of mine, incidentally). My daughter seemed to have been totally ignored by the Google algorithm, but at least Google's image search algorithm did not rely on metadata this time.
In this article, we describe a deployed educational technology application: the Criterion Online Essay Evaluation Service, a web-based system that provides automated scoring and evaluation of student essays. Criterion has two complementary applications: (1) CritiqueWriting Analysis Tools, a suite of programs that detect errors in grammar, usage, and mechanics, that identify discourse elements in the essay, and that recognize potentially undesirable elements of style, and (2) e-rater version 2.0, an automated essay scoring system. Critique and e-rater provide students with feedback that is specific to their writing in order to help them improve their writing skills and is intended to be used under the instruction of a classroom teacher. All of these capabilities outperform baseline algorithms, and some of the tools agree with human judges in their evaluations as often as two judges agree with each other.