Automatically constructing a dictionary for information extraction tasks

Feb-1-1993–Classics

Knowledge-based natural language processing systems have achieved good success with certain tasks but they are often criticized because they depend on a domain-specific dictionary that requires a great deal of manual knowledge engineering. This knowledge engineering bottleneck makes knowledge-based NLP systems impractical for real-world applications because they cannot be easily scaled up or ported to new domains. In response to this problem, we developed a system called AutoSlog that automatically builds a domain-specific dictionary of concepts for extracting information from text. Using AutoSlog, we constructed a dictionary for the domain of terrorist event descriptions in only 5 person-hours. We then compared the AutoSlog dictionary with a handcrafted dictionary that was built by two highly skilled graduate students and required approximately 1500 person-hours of effort. We evaluated the two dictionaries using two blind test sets of 100 texts each. Overall, the AutoSlog dictionary achieved 98% of the performance of the handcrafted dictionary. On the first test set, the Auto-Slog dictionary obtained 96.3% of the performance of the handcrafted dictionary. On the second test set, the overall scores were virtually indistinguishable with the AutoSlog dictionary achieving 99.7% of the performance of the handcrafted dictionary.

artificial intelligence, expert system, natural language, (17 more...)

Classics

Feb-1-1993

Classics PDF

Add feedback

Country:
- South America (0.04)
- North America
  - Central America (0.04)
  - United States
    - Massachusetts
      - Hampshire County > Amherst (0.14)
      - Suffolk County > Boston (0.04)
    - California > San Mateo County
      - San Mateo (0.04)

Industry:
- Law Enforcement & Public Safety > Terrorism (0.72)
- Government (0.69)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Expert Systems (1.00)
  - Natural Language
    - Information Extraction (1.00)
    - Text Processing (0.95)
    - Grammars & Parsing (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found