Goto

Collaborating Authors

 Grammars & Parsing


The Use of Classifiers in Sequential Inference

Neural Information Processing Systems

We study the problem of combining the outcomes of several different classifiers in a way that provides a coherent inference that satisfies some constraints. In particular, we develop two general approaches for an im(cid:173) portant subproblem - identifying phrase structure. The first is a Marko(cid:173) vian approach that extends standard HMMs to allow the use of a rich ob(cid:173) servation structure and of general classifiers to model state-observation dependencies. The second is an extension of constraint satisfaction for(cid:173) malisms. We develop efficient combination algorithms under both mod(cid:173) els and study them experimentally in the context of shallow parsing.


Convolution Kernels for Natural Language

Neural Information Processing Systems

We describe the application of kernel methods to Natural Language Pro- cessing (NLP) problems. In many NLP tasks the objects being modeled are strings, trees, graphs or other discrete structures which require some mechanism to convert them into feature vectors. We describe kernels for various natural language structures, allowing rich, high dimensional rep- resentations of these structures. We show how a kernel over trees can be applied to parsing using the voted perceptron algorithm, and we give experimental results on the ATIS corpus of parse trees.


Natural Language Grammar Induction Using a Constituent-Context Model

Neural Information Processing Systems

This paper presents a novel approach to the unsupervised learning of syn- tactic analyses of natural language text. Most previous work has focused on maximizing likelihood according to generative PCFG models. In con- trast, we employ a simpler probabilistic model over trees based directly on constituent identity and linear context, and use an EM-like iterative procedure to induce structure. This method produces much higher qual- ity analyses, giving the best published results on the ATIS dataset. 1 Overview To enable a wide range of subsequent tasks, human language sentences are standardly given tree-structure analyses, wherein the nodes in a tree dominate contiguous spans of words called constituents, as in figure 1(a). Constituents are the linguistically coherent units in the sentence, and are usually labeled with a constituent category, such as noun phrase (NP) or verb phrase (VP).


Automatic Acquisition and Efficient Representation of Syntactic Structures

Neural Information Processing Systems

The distributional principle according to which morphemes that occur in identical contexts belong, in some sense, to the same category [1] has been advanced as a means for extracting syntactic structures from corpus data. We extend this principle by applying it recursively, and by us- ing mutual information for estimating category coherence. The resulting model learns, in an unsupervised fashion, highly structured, distributed representations of syntactic knowledge from corpora. It also exhibits promising behavior in tasks usually thought to require representations anchored in a grammar, such as systematicity.


Fast Exact Inference with a Factored Model for Natural Language Parsing

Neural Information Processing Systems

We present a novel generative model for natural language tree structures in which semantic (lexical dependency) and syntactic (PCFG) structures are scored with separate models. This factorization provides concep- tual simplicity, straightforward opportunities for separately improving the component models, and a level of performance comparable to simi- lar, non-factored models. Most importantly, unlike other modern parsing models, the factored model admits an extremely effective A* parsing al- gorithm, which enables efficient, exact inference.


Kernels for Structured Natural Language Data

Neural Information Processing Systems

This paper devises a novel kernel function for structured natural language data. In the field of Natural Language Processing, feature extraction consists of the following two steps: (1) syntactically and semantically analyzing raw data, i.e., character strings, then representing the results as discrete structures, such as parse trees and dependency graphs with part-of-speech tags; (2) creating (possibly high-dimensional) numerical feature vectors from the discrete structures. The new kernels, called Hier- archical Directed Acyclic Graph (HDAG) kernels, directly accept DAGs whose nodes can contain DAGs. HDAG data structures are needed to fully reflect the syntactic and semantic structures that natural language data inherently have. In this paper, we define the kernel function and show how it permits efficient calculation.


Online Learning via Global Feedback for Phrase Recognition

Neural Information Processing Systems

This work presents an architecture based on perceptrons to recognize phrase structures, and an online learning algorithm to train the percep- trons together and dependently. The recognition strategy applies learning in two layers: a filtering layer, which reduces the search space by identi- fying plausible phrase candidates, and a ranking layer, which recursively builds the optimal phrase structure. We provide a recognition-based feed- back rule which reflects to each local function its committed errors from a global point of view, and allows to train them together online as percep- trons. Experimentation on a syntactic parsing problem, the recognition of clause hierarchies, improves state-of-the-art results and evinces the advantages of our global training method over optimizing each function locally and independently.


Scalable Discriminative Learning for Natural Language Parsing and Translation

Neural Information Processing Systems

Parsing and translating natural languages can be viewed as problems of predicting tree structures. For machine learning approaches to these predictions, the diversity and high dimensionality of the structures involved mandate very large training sets. This paper presents a purely discriminative learning method that scales up well to problems of this size. Its accuracy was at least as good as other comparable methods on a standard parsing task. To our knowledge, it is the first purely discriminative learning algorithm for translation with treestructured models. Unlike other popular methods, this method does not require a great deal of feature engineering a priori, because it performs feature selection over a compound feature space as it learns.


Unsupervised Learning of a Probabilistic Grammar for Object Detection and Parsing

Neural Information Processing Systems

We describe an unsupervised method for learning a probabilistic grammar of an object from a set of training examples. Our approach is invariant to the scale and rotation of the objects. We illustrate our approach using thirteen objects from the Caltech 101 database. In addition, we learn the model of a hybrid object class where we do not know the specific object or its position, scale or pose. This is illustrated by learning a hybrid class consisting of faces, motorbikes, and airplanes.


Adaptor Grammars: A Framework for Specifying Compositional Nonparametric Bayesian Models

Neural Information Processing Systems

This paper introduces adaptor grammars, a class of probabilistic models of lan- guage that generalize probabilistic context-free grammars (PCFGs). Adaptor grammars augment the probabilistic rules of PCFGs with "adaptors" that can in- duce dependencies among successive uses. With a particular choice of adaptor, based on the Pitman-Yor process, nonparametric Bayesian models of language using Dirichlet processes and hierarchical Dirichlet processes can be written as simple grammars. We present a general-purpose inference algorithm for adaptor grammars, making it easy to define and use such models, and illustrate how several existing nonparametric Bayesian models can be expressed within this framework.