Wikipedia-based Semantic Interpretation for Natural Language Processing
Gabrilovich, E., Markovitch, S.
–Journal of Artificial Intelligence Research
Adequate representation of natural language semantics requires access to vast amounts of common sense and domain-specific world knowledge. Prior work in the field was based on purely statistical techniques that did not make use of background knowledge, on limited lexicographic knowledge bases such as WordNet, or on huge manual efforts such as the CYC project. Here we propose a novel method, called Explicit Semantic Analysis (ESA), for fine-grained semantic interpretation of unrestricted natural language texts. Our method represents meaning in a high-dimensional space of concepts derived from Wikipedia, the largest encyclopedia in existence. We explicitly represent the meaning of any text in terms of Wikipedia-based concepts. We evaluate the effectiveness of our method on text categorization and on computing the degree of semantic relatedness between fragments of natural language text. Using ESA results in significant improvements over the previous state of the art in both tasks. Importantly, due to the use of natural concepts, the ESA model is easy to explain to human users.
Journal of Artificial Intelligence Research
Mar-30-2009
- Country:
- Atlantic Ocean (0.04)
- South America
- Brazil (0.04)
- Peru > Loreto Department (0.04)
- Oceania > Australia
- South Australia > Adelaide (0.04)
- New South Wales > Sydney (0.04)
- North America
- Canada (0.04)
- United States
- District of Columbia > Washington (0.04)
- Wisconsin > Dane County
- Madison (0.04)
- Oklahoma > Oklahoma County
- Oklahoma City (0.04)
- New York > New York County
- New York City (0.04)
- Massachusetts
- Middlesex County > Cambridge (0.04)
- Suffolk County > Boston (0.04)
- California > Santa Clara County
- Santa Clara (0.04)
- Europe
- Italy (0.04)
- United Kingdom
- Scotland > City of Edinburgh
- Edinburgh (0.04)
- England > Cambridgeshire
- Cambridge (0.04)
- Scotland > City of Edinburgh
- Portugal > Lisbon
- Lisbon (0.04)
- Netherlands > South Holland
- Dordrecht (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Germany > Baden-Württemberg
- Karlsruhe Region > Heidelberg (0.04)
- France > Île-de-France
- Asia
- South Korea (0.04)
- Middle East
- Iraq (0.28)
- Israel > Haifa District
- Haifa (0.04)
- Genre:
- Research Report
- New Finding (1.00)
- Experimental Study (0.67)
- Research Report
- Industry:
- Leisure & Entertainment (1.00)
- Information Technology (1.00)
- Media > Film (1.00)
- Banking & Finance (1.00)
- Law (0.92)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.92)
- Consumer Products & Services (0.92)
- Automobiles & Trucks (0.92)
- Transportation (0.67)
- Education (0.67)
- Health & Medicine
- Pharmaceuticals & Biotechnology (1.00)
- Consumer Health (1.00)
- Therapeutic Area
- Oncology (1.00)
- Immunology (1.00)
- Hematology (1.00)
- Government