Goto

Collaborating Authors

 Grammars & Parsing


MiTAP for Biosecurity: A Case Study

AI Magazine

MITAP (MITRE text and audio processing) is a prototype system available for monitoring infectious disease outbreaks and other global events. MITAP focuses on providing timely, multilingual, global information access to medical experts and individuals involved in humanitarian assistance and relief work. Multiple information sources in multiple languages are automatically captured, filtered, translated, summarized, and categorized by disease, region, information source, person, and organization. Critical information is automatically extracted and tagged to facilitate browsing, searching, and sorting. The system supports shared situational awareness through collaboration, allowing users to submit other articles for processing, annotate existing documents, post directly to the system, and flag messages for others to see. MITAP currently stores over 1 million articles and processes an additional 2,000 to 10,000 daily, delivering up-to-date information to dozens of regular users.


A Unified Model of Structural Organization in Language and Music

Journal of Artificial Intelligence Research

Is there a general model that can predict the perceived phrase structure in language and music? While it is usually assumed that humans have separate faculties for language and music, this work focuses on the commonalities rather than on the differences between these modalities, aiming at finding a deeper 'faculty'. Our key idea is that the perceptual system strives for the simplest structure (the 'simplicity principle'), but in doing so it is biased by the likelihood of previous structures (the 'likelihood principle'). We present a series of data-oriented parsing (DOP) models that combine these two principles and that are tested on the Penn Treebank and the Essen Folksong Collection. Our experiments show that (1) a combination of the two principles outperforms the use of either of them, and (2) exactly the same model with the same parameter setting achieves maximum accuracy for both language and music. We argue that our results suggest an interesting parallel between linguistic and musical structuring.


The Use of Classifiers in Sequential Inference

Neural Information Processing Systems

We study the problem of combining the outcomes of several different classifiers in a way that provides a coherent inference that satisfies some constraints. In particular, we develop two general approaches for an important subproblem - identifying phrase structure. The first is a Markovian approach that extends standard HMMs to allow the use of a rich observation structure and of general classifiers to model state-observation dependencies. The second is an extension of constraint satisfaction formalisms. We develop efficient combination algorithms under both models and study them experimentally in the context of shallow parsing.


The Use of Classifiers in Sequential Inference

Neural Information Processing Systems

We study the problem of combining the outcomes of several different classifiers in a way that provides a coherent inference that satisfies some constraints. In particular, we develop two general approaches for an important subproblem - identifying phrase structure. The first is a Markovian approach that extends standard HMMs to allow the use of a rich observation structure and of general classifiers to model state-observation dependencies. The second is an extension of constraint satisfaction formalisms. We develop efficient combination algorithms under both models and study them experimentally in the context of shallow parsing.


The Use of Classifiers in Sequential Inference

Neural Information Processing Systems

We study the problem of combining the outcomes of several different classifiers in a way that provides a coherent inference that satisfies some constraints. In particular, we develop two general approaches for an important subproblem-identifying phrase structure. The first is a Markovian approach that extends standard HMMs to allow the use of a rich observation structureand of general classifiers to model state-observation dependencies. The second is an extension of constraint satisfaction formalisms. Wedevelop efficient combination algorithms under both models and study them experimentally in the context of shallow parsing.


Computational Approach to Anaphora Resolution in Spanish Dialogues

Journal of Artificial Intelligence Research

This paper presents an algorithm for identifying noun-phrase antecedents of pronouns and adjectival anaphors in Spanish dialogues. We believe that anaphora resolution requires numerous sources of information in order to find the correct antecedent of the anaphor. These sources can be of different kinds, e.g., linguistic information, discourse/dialogue structure information, or topic information. For this reason, our algorithm uses various different kinds of information (hybrid information). The algorithm is based on linguistic constraints and preferences and uses an anaphoric accessibility space within which the algorithm finds the noun phrase. We present some experiments related to this algorithm and this space using a corpus of 204 dialogues. The algorithm is implemented in Prolog. According to this study, 95.9% of antecedents were located in the proposed space, a precision of 81.3% was obtained for pronominal anaphora resolution, and 81.5% for adjectival anaphora.


Relationship between Natural Language Processing and AI: The Role of Constrained Formal-Computational Systems

AI Magazine

Modeling various aspects of language-syntax, semantics, pragmatics, and discourse, among others -- by the use of constrained formal-computational systems, just adequate for such modeling, has proved to be an effective research strategy, leading to deep understanding of these aspects, with implications for both machine processing and human processing. This approach enables one to distinguish between the universal and stipulative constraints. This is in contrast to an approach where we start with the most powerful formal-computational system and then model the phenomena by making all constraints stipulative in a sense. The use of constrained systems for modeling leads to some novel ways of describing locality of structures and brings out the relationship between the complexity of description of primitives and local computations over them. These ideas serve to unify theoretical, computational, and statistical aspects of natural language processing in AI. It is expected that this approach will also be productive in other domains of AI.


Corpus-Based Approaches to Semantic Interpretation in NLP

AI Magazine

In recent years, there has been a flurry of research into empirical, corpus-based learning approaches to natural language processing (NLP). The success of these approaches has stimulated research in using empirical learning techniques in other facets of NLP, including semantic analysis -- uncovering the meaning of an utterance. This article is an introduction to some of the emerging research in the application of corpus-based learning techniques to problems in semantic interpretation. In particular, we focus on two important problems in semantic interpretation, namely, word-sense disambiguation and semantic parsing.


Statistical Techniques for Natural Language Parsing

AI Magazine

I review current statistical work on syntactic parsing and then consider part-of-speech tagging, which was the first syntactic problem to successfully be attacked by statistical techniques and also serves as a good warm-up for the main topic-statistical parsing. Here, I consider both the simplified case in which the input string is viewed as a string of parts of speech and the more interesting case in which the parser is guided by statistical information about the particular words in the sentence. Finally, I anticipate future research directions.


Corpus-Based Approaches to Semantic Interpretation in NLP

AI Magazine

In recent years, there has been a flurry of research into empirical, corpus-based learning approaches to natural language processing (NLP). Most empirical NLP work to date has focused on relatively low-level language processing such as part-of-speech tagging, text segmentation, and syntactic parsing. The success of these approaches has stimulated research in using empirical learning techniques in other facets of NLP, including semantic analysis -- uncovering the meaning of an utterance. This article is an introduction to some of the emerging research in the application of corpus-based learning techniques to problems in semantic interpretation. In particular, we focus on two important problems in semantic interpretation, namely, word-sense disambiguation and semantic parsing.