Unsupervised Context Sensitive Language Acquisition from a Large Corpus

Solan, Zach, Horn, David, Ruppin, Eytan, Edelman, Shimon

Neural Information Processing Systems 

We describe a pattern acquisition algorithm that learns, in an unsupervised fashion,a streamlined representation of linguistic structures from a plain natural-language corpus. This paper addresses the issues of learning structuredknowledge from a large-scale natural language data set, and of generalization to unseen text. The implemented algorithm represents sentencesas paths on a graph whose vertices are words (or parts of words). Significant patterns, determined by recursive context-sensitive statistical inference, form new vertices. Linguistic constructions are represented bytrees composed of significant patterns and their associated equivalence classes. An input module allows the algorithm to be subjected toa standard test of English as a Second Language (ESL) proficiency. Theresults are encouraging: the model attains a level of performance consideredto be "intermediate" for 9th-grade students, despite having been trained on a corpus (CHILDES) containing transcribed speech of parents directed to small children.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found