Children are facile at both discovering word boundaries and using those words to build higher-level structures in tandem. Current research treats lexical acquisition and grammar induction as two distinct tasks; doing so has led to unreasonable assumptions. State-ofthe-art unsupervised results presuppose a perfectly segmented, noise-free lexicon, while largely ignoring how the lexicon is used. This paper combines both tasks in a novel framework for bootstrapping lexical acquisition and grammar induction.
Past research on grammar induction has found promising results in predicting parts-of-speech from n-grams using a fixed vocabulary and a fixed context. In this study, we investigated grammar induction whereby we varied vocabulary size and context size. Results indicated that as context increased for a fixed vocabulary, overall accuracy initially increased but then leveled off. Importantly, this increase in accuracy did not occur at the same rate across all syntactic categories. We also address the dynamic relation between context and vocabulary in terms of grammar induction in an unsupervised methodology. We formulate a model that represents a relationship between vocabulary and context for grammar induction. Our results concur with what has been called the word spurt phenomenon in the child language acquisition literature.
We present a simple EM-based grammar induction algorithm for Combinatory Categorial Grammar (CCG) that achieves state-of-the-art performance by relying on a minimal number of very general linguistic principles. Unlike previous work on unsupervised parsing with CCGs, our approach has no prior language-specific knowledge, and discovers all categories automatically. Additionally, unlike other approaches, our grammar remains robust when parsing longer sentences, performing as well as or better than other systems. We believe this is a natural result of using an expressive grammar formalism with an extended domain of locality.
We present a relational learning framework for grammar induction that is able to learn meaning as well as syntax. We introduce a type of constraint-based grammar, lexicalized well-founded grammar (lwfg), and we prove that it can always be learned from a small set of semantically annotated examples, given a set of assumptions. The semantic representation chosen allows us to learn the constraints together with the grammar rules, as well as an ontology-based semantic interpretation. We performed a set of experiments showing that several fragments of natural language can be covered by a lwfg, and that it is possible to choose the representative examples heuristically, based on linguistic knowledge.