Automatic Reduction of a Document-Derived Noun Vocabulary
Anderson, Sven (Bard College) | Thomas, S. Rebecca (Bard College) | Segal, Camden (Bard College) | Wu, Yu (Stanford University)
We propose and evaluate five related algorithms that automatically derive limited-size noun vocabularies from text documents of 2,000-30,000 words.The proposed algorithms combine Personalized Page Rank and principles of information maximization, and are applied to the WordNet graph for nouns. For the best-performing algorithm the difference between automatically generated reduced noun lexicons and those created by human writers is approximately 1-2 WordNet edges per lexical item. Our results also indicate the importance of performing word-sense disambiguation with sentence-level context information at the earliest stage of analysis.
May-18-2011
- Country:
- North America > United States > California > Santa Clara County (0.14)
- Genre:
- Research Report > New Finding (0.48)
- Technology: