A system created by MIT researchers could be used to automatically update factual inconsistencies in Wikipedia articles, reducing time and effort spent by human editors who now do the task manually. Wikipedia comprises millions of articles that are in constant need of edits to reflect new information. That can involve article expansions, major rewrites, or more routine modifications such as updating numbers, dates, names, and locations. Currently, humans across the globe volunteer their time to make these edits. In a paper being presented at the AAAI Conference on Artificial Intelligence, the researchers describe a text-generating system that pinpoints and replaces specific information in relevant Wikipedia sentences, while keeping the language similar to how humans write and edit.
A new "text-generating system" created by the brains behind Massachusetts Institute of Technology may be the beginning of the end for all human editing jobs. The system, announced in a press release Wednesday, is able to rummage through the millions of Wikipedia pages, sniff around for outdated data, and replace it with the most recent information available on the internet in a "human-like" style -- thus making the need for real, hot-blooded editors basically obsolete.
A computer system has been developed that scans through a Wikipedia article and locates, checks, and corrects any factual errors automatically. This AI-powered system can keep sentences up to date and save human editors the hassle, while maintaining a human tone in the writing. The technology was created at MIT and would allow for efficient and accurate updates to Wikipedia's 52 million articles. This AI-powered system can keep sentences up to date and save human editors the hassle, while maintaining a human tone in the writing. For example, say there's a required update to this sentence: 'Fund A considers 28 of their 42 minority stakeholdings in operationally active companies to be of particular significance to the group.'
The English Wikipedia has over 6 million articles, and the combined Wikipedias for all other languages contain over 28 billion words in 52 million articles in 309 languages. It's an incomparably valuable resource for knowledge seekers, needless to say, but one that requires pruning by the over 132,000 registered active monthly editors. In search of an autonomous solution, researchers at MIT developed an AI and machine learning system that updates inconsistencies in Wikipedia articles. Thanks to a family of algorithms, it's able to identify and use the latest information from around the web to produce written sentences corresponding to articles that reflect updated information. The algorithms in question were trained on a data set containing pairs of sentences, in which one sentence is a claim and the other is a relevant Wikipedia sentence.
We present a novel paradigm for obtaining large amounts of training data for computational linguistics tasks by mining Wikipedia's article revision history. By comparing adjacent versions of the same article, we extract voluminous training data for tasks for which data is usually scarce or costly to obtain. We illustrate this paradigm by applying it to three separate text processing tasks at various levels of linguistic granularity. We first apply this approach to the collection of textual errors and their correction, focusing on the specific type of lexical errors known as "eggcorns". Second, moving up to the sentential level, we show how to mine Wikipedia revisions for training sentence compression algorithms. By dramatically increasing the size of the available training data, we are able to create more discerning lexicalized models, providing improved compression results. Finally, moving up to the document level, we present some preliminary ideas on how to use the Wikipedia data to bootstrap text summarization systems. We propose to use a sentence's persistence throughout a document's evolution as an indicator of its fitness as part of an extractive summary.