Learning from the Web: Extracting General World Knowledge from Noisy Text

Gordon, Jonathan (University of Rochester) | Durme, Benjamin Van (Johns Hopkins University) | Schubert, Lenhart K. (University of Rochester)

AAAI Conferences 

The quality and nature of knowledge that can be found by an automated knowledge-extraction system depends on its inputs. For systems that learn by reading text, the Web offers a breadth of topics and currency, but it also presents the problems of dealing with casual, unedited writing, non-textual inputs, and the mingling of languages. The results of extraction using the KNEXT system on two Web corpora — Wikipedia and a collection of weblog entries — indicate that, with automatic filtering of the output, even ungrammatical writing on arbitrary topics can yield an extensive knowledge base, which human judges find to be of good quality, with propositions receiving an average score across both corpora of 2.34 (where the range is 1 to 5 and lower is better) versus 3.00 for unfiltered output from the same sources.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found