Learning from the Web: Extracting General World Knowledge from Noisy Text

Gordon, Jonathan (University of Rochester) | Durme, Benjamin Van (Johns Hopkins University) | Schubert, Lenhart K. (University of Rochester)

Jul-8-2010–AAAI Conferences

The quality and nature of knowledge that can be found by an automated knowledge-extraction system depends on its inputs. For systems that learn by reading text, the Web offers a breadth of topics and currency, but it also presents the problems of dealing with casual, unedited writing, non-textual inputs, and the mingling of languages. The results of extraction using the KNEXT system on two Web corpora — Wikipedia and a collection of weblog entries — indicate that, with automatic filtering of the output, even ungrammatical writing on arbitrary topics can yield an extensive knowledge base, which human judges find to be of good quality, with propositions receiving an average score across both corpora of 2.34 (where the range is 1 to 5 and lower is better) versus 3.00 for unfiltered output from the same sources.

factoid, knowledge, wikipedia, (15 more...)

AAAI Conferences

Jul-8-2010

Conferences PDF

Add feedback

Country:
- North America > United States
  - Maryland > Baltimore (0.04)
  - New York > Monroe County
    - Rochester (0.04)
- Asia > Middle East
  - Bahrain > Capital Governorate > Manama (0.04)

Technology:
- Information Technology
  - Data Science > Data Mining (1.00)
  - Communications > Social Media (1.00)
  - Artificial Intelligence
    - Natural Language > Text Processing (0.69)
    - Representation & Reasoning > Expert Systems (0.49)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found