Goto

Collaborating Authors

 constituent-context model


Natural Language Grammar Induction Using a Constituent-Context Model

Neural Information Processing Systems

This paper presents a novel approach to the unsupervised learning of syn- tactic analyses of natural language text. Most previous work has focused on maximizing likelihood according to generative PCFG models. In con- trast, we employ a simpler probabilistic model over trees based directly on constituent identity and linear context, and use an EM-like iterative procedure to induce structure. This method produces much higher qual- ity analyses, giving the best published results on the ATIS dataset. 1 Overview To enable a wide range of subsequent tasks, human language sentences are standardly given tree-structure analyses, wherein the nodes in a tree dominate contiguous spans of words called constituents, as in figure 1(a). Constituents are the linguistically coherent units in the sentence, and are usually labeled with a constituent category, such as noun phrase (NP) or verb phrase (VP).