Grammars & Parsing
Scalable Discriminative Learning for Natural Language Parsing and Translation
Turian, Joseph, Wellington, Benjamin, Melamed, I. D.
Parsing and translating natural languages can be viewed as problems of predicting tree structures. For machine learning approaches to these predictions, the diversity and high dimensionality of the structures involved mandate very large training sets. This paper presents a purely discriminative learning method that scales up well to problems of this size. Its accuracy was at least as good as other comparable methods on a standard parsing task. To our knowledge, it is the first purely discriminative learning algorithm for translation with treestructured models. Unlike other popular methods, this method does not require a great deal of feature engineering a priori, because it performs feature selection over a compound feature space as it learns. Experiments demonstrate the method's versatility, accuracy, and efficiency. Relevant software is freely available at http://nlp.cs.nyu.edu/parser and http://nlp.cs.nyu.edu/GenPar.
Adaptor Grammars: A Framework for Specifying Compositional Nonparametric Bayesian Models
Johnson, Mark, Griffiths, Thomas L., Goldwater, Sharon
This paper introduces adaptor grammars, a class of probabilistic models of language that generalize probabilistic context-free grammars (PCFGs). Adaptor grammars augment the probabilistic rules of PCFGs with "adaptors" that can induce dependencies among successive uses. With a particular choice of adaptor, based on the Pitman-Yor process, nonparametric Bayesian models of language using Dirichlet processes and hierarchical Dirichlet processes can be written as simple grammars. We present a general-purpose inference algorithm for adaptor grammars, making it easy to define and use such models, and illustrate how several existing nonparametric Bayesian models can be expressed within this framework.
Unsupervised Learning of a Probabilistic Grammar for Object Detection and Parsing
Chen, Yuanhao, Zhu, Long, Yuille, Alan L.
We describe an unsupervised method for learning a probabilistic grammar of an object from a set of training examples. Our approach is invariant to the scale and rotation of the objects. We illustrate our approach using thirteen objects from the Caltech 101 database. In addition, we learn the model of a hybrid object class where we do not know the specific object or its position, scale or pose. This is illustrated by learning a hybrid class consisting of faces, motorbikes, and airplanes. The individual objects can be recovered as different aspects of the grammar for the object class.
Adaptor Grammars: A Framework for Specifying Compositional Nonparametric Bayesian Models
Johnson, Mark, Griffiths, Thomas L., Goldwater, Sharon
This paper introduces adaptor grammars, a class of probabilistic models of language that generalize probabilistic context-free grammars (PCFGs). Adaptor grammars augment the probabilistic rules of PCFGs with "adaptors" that can induce dependencies among successive uses. With a particular choice of adaptor, based on the Pitman-Yor process, nonparametric Bayesian models of language using Dirichlet processes and hierarchical Dirichlet processes can be written as simple grammars. We present a general-purpose inference algorithm for adaptor grammars, making it easy to define and use such models, and illustrate how several existing nonparametric Bayesian models can be expressed within this framework.
Adaptor Grammars: A Framework for Specifying Compositional Nonparametric Bayesian Models
Johnson, Mark, Griffiths, Thomas L., Goldwater, Sharon
This paper introduces adaptor grammars, a class of probabilistic models of language thatgeneralize probabilistic context-free grammars (PCFGs). Adaptor grammars augment the probabilistic rules of PCFGs with "adaptors" that can induce dependenciesamong successive uses. With a particular choice of adaptor, based on the Pitman-Yor process, nonparametric Bayesian models of language using Dirichlet processes and hierarchical Dirichlet processes can be written as simple grammars. We present a general-purpose inference algorithm for adaptor grammars, making it easy to define and use such models, and illustrate how several existing nonparametric Bayesian models can be expressed within this framework.
Unsupervised Learning of a Probabilistic Grammar for Object Detection and Parsing
Chen, Yuanhao, Zhu, Long, Yuille, Alan L.
We describe an unsupervised method for learning a probabilistic grammar of an object from a set of training examples. Our approach is invariant to the scale and rotation of the objects. We illustrate our approach using thirteen objects from the Caltech 101 database. In addition, we learn the model of a hybrid object class where we do not know the specific object or its position, scale or pose. This is illustrated bylearning a hybrid class consisting of faces, motorbikes, and airplanes. The individual objects can be recovered as different aspects of the grammar for the object class.
Scalable Discriminative Learning for Natural Language Parsing and Translation
Turian, Joseph, Wellington, Benjamin, Melamed, I. D.
Parsing and translating natural languages can be viewed as problems of predicting treestructures. For machine learning approaches to these predictions, the diversity and high dimensionality of the structures involved mandate very large training sets. This paper presents a purely discriminative learning method that scales up well to problems of this size. Its accuracy was at least as good as other comparable methods on a standard parsing task. To our knowledge, it is the first purely discriminative learning algorithm for translation with treestructured models.Unlike other popular methods, this method does not require a great deal of feature engineering a priori, because it performs feature selection overa compound feature space as it learns. Experiments demonstrate the method's versatility, accuracy, and efficiency. Relevant software is freely available at http://nlp.cs.nyu.edu/parser and http://nlp.cs.nyu.edu/GenPar.
Meaning and Links
This article presents some fundamental ideas about representing knowledge and dealing with meaning in computer representations. I will describe the issues as I currently understand them and describe how they came about, how they fit together, what problems they solve, and some of the things that the resulting framework can do. The ideas apply not just to graph-structured "node-and-link" representations, sometimes called semantic networks, but also to representations referred to variously as frames with slots, entities with relationships, objects with attributes, tables with columns, and records with fields and to the classes and variables of object-oriented data structures. I will start by describing some background experiences and thoughts that preceded the writing of my 1975 paper, "What's in a Link," which introduced many of these issues. After that, I will present some of the key ideas from that paper with a discussion of how some of those ideas have matured since then. Finally, I will describe some practical applications of these ideas in the context of knowledge access and information retrieval and will conclude with some thoughts about where I think we can go from here.
Practical Approach to Knowledge-based Question Answering with Natural Language Understanding and Advanced Reasoning
This research hypothesized that a practical approach in the form of a solution framework known as Natural Language Understanding and Reasoning for Intelligence (NaLURI), which combines full-discourse natural language understanding, powerful representation formalism capable of exploiting ontological information and reasoning approach with advanced features, will solve the following problems without compromising practicality factors: 1) restriction on the nature of question and response, and 2) limitation to scale across domains and to real-life natural language text.
A Robust Linguistic Platform for Efficient and Domain specific Web Content Analysis
Hamon, Thierry, Nazarenko, Adeline, Poibeau, Thierry, Aubin, Sophie, Derivière, Julien
Web semantic access in specific domains calls for specialized search engines with enhanced semantic querying and indexing capacities, which pertain both to information retrieval (IR) and to information extraction (IE). A rich linguistic analysis is required either to identify the relevant semantic units to index and weight them according to linguistic specific statistical distribution, or as the basis of an information extraction process. Recent developments make Natural Language Processing (NLP) techniques reliable enough to process large collections of documents and to enrich them with semantic annotations. This paper focuses on the design and the development of a text processing platform, Ogmios, which has been developed in the ALVIS project. The Ogmios platform exploits existing NLP modules and resources, which may be tuned to specific domains and produces linguistically annotated documents. We show how the three constraints of genericity, domain semantic awareness and performance can be handled all together.