Modeling various aspects of language--syntax, semantics, pragmatics, and discourse, among others--by the use of constrained formal-computational systems, just adequate for such modeling, has proved to be an effective research strategy, leading to deep understanding of these aspects, with implications for both machine processing and human processing. This approach enables one to distinguish between the universal and stipulative constraints.
I review current statistical work on syntactic parsing and then consider part-of-speech tagging, which was the first syntactic problem to successfully be attacked by statistical techniques and also serves as a good warm-up for the main topic--statistical parsing. Here, I consider both the simplified case in which the input string is viewed as a string of parts of speech and the more interesting case in which the parser is guided by statistical information about the particular words in the sentence. Finally, I anticipate future research directions. In this example, I adopt the standard abbreviations: s for sentence, np for noun phrase, vp for verb phrase, and det for determiner. It is generally accepted that finding the sort of structure shown in figure 1 is useful in determining the meaning of a sentence.
This report is a review of the First National Conference on Knowledge Representation and Inference in Sanskrit, Bangalore, India, 20 through 22 December, 1986 The conference was inspired by an article that appeared in the Spring 1985 issue of AI Magazine--"Knowledge Representation in Sanskrit and Artificial Intelligence." A working group has been created to pursue the goals of the conference and to possibly arrange another conference for 1987 and 1988 This conference is analogous to the consultation of philosophers and cognitive psychologists by computer scientists in the beginnings of AI. Western psychology and philosophy is quite different from the Indo-Aryan tradition: the former has its basis in Aristotelian logic and the scientific method, whereas the latter is also based on introspection and internal experience Nevertheless, both these schools have converged in the analysis of natural language and the extraction of the semantic message from a text.The purpose of AI in this context is to derive a "method" for natural language understanding; the purpose for the Sanskrit scholars was to understand the nature of language and thought in and of itself. Hence, for the Sanskrit scholars, the actual methodology was implicit; it was not the focus. The purpose of the conference was to extract this hidden "algorithm" of automatic semantic parsing from the Sanskrit pandits.
In the past twenty years, much time, effort, and money has been expended on designing an unambiguous representation of natural languages to make them accessible to computer processing These efforts have centered around creating schemata designed to parallel logical relations with relations expressed by the syntax and semantics of natural languages, which are clearly cumbersome and ambiguous in their function as vehicles for the transmission of logical data. Understandably, there is a widespread belief that natural languages arc unsuitable for the transmission of many ideas that artificial languages can render with great precision and mathematical rigor. But this dichotomy, which has served as a premise underlying much work in the areas of linguistics and artificial intelligence, is a false one There is at least one language, Sanskrit, which for the duration of almost 1000 years was a living spoken language with a considerable literature of its own Besides works of literary value, there was a long philosophical and grammatical tradition that has continued to exist with undiminished vigor until the present century. Among the accomplishments of the grammarians can be reckoned a method for paraphrasing Sanskrit in a manner that is identical not only in essence but in form with current work in Artificial Intelligence This article demonstrates that a natural language can serve as an artificial language also, and that much work in AI has been reinventing a wheel millenia old First, a typical Knowledge Representation Scheme (using Semantic Nets) will be laid out, followed by an outline of the method used by the ancient Indian Grammarians to analyze sentences unambiguously. Finally, the clear parallelism between the two will be demonstrated, and the theoretical implications of this equivalence will be given.
The other articles in the NL chapter of the Handbook include a historical sketch of machine translation from one language to another, which was the subject of the very earliest ideas about processing language with computers; technical articles on some of the grammars and parsing techniques that AI researchers have used in their programs; and an article on text generation, the creation of sentences by the program. Finally, there are several articles describing the NL programs themselves: the early systems of the 1960s and the major research projects of the last decade, including Wilks'S machine translation system, Winograd's SHRDLU, Woods's LUNAR, Schank's MARGIE, SAM, and PAM, and Hendrix's LIFER. Two other chapters of the Handbook are especially relevant to NL research. Speech understanding research attempts to build computer interfaces that understand spoken language. In the 197Os, speech and natural language understanding research were often closely linked.
An important issue in achieving acceptance of computer systems used by the nonprogramming community is the ability to communicate with these systems in natural language. Often, a great deal of time in the design of any such system is devoted to the natural language front end. An obvious way to simplify this task is to provide a portable natural language front-end tool or facility that is sophisticated enough to allow for a reasonable variety of input; allows modification; and, yet, is easy to use. This paper describes such a tool that is based on augmented transition networks (ATNs). It allows for user input to be in sentence or nonsentence form or both, provides a detailed parse tree that the user can access, and also provides the facility to generate responses and save information.
The Workshop on Future Directions in NLP was held at Bolt Beranek and Newman, Inc. (BBN), in Cambridge, Massachusetts, from 29 November to 1 December 1989. The workshop was organized and hosted by Madeleine Bates and Ralph Weischedel of the BBN Speech and Natural Language Department and sponsored by BBN's Science Development Program. Thirty-six leading researchers and government representatives gathered to discuss the direction of the field of natural language processing (NLP) over the next 5 to 10 years. The intent of the symposium was "to make the conference and resulting volume an intellectual landmark for the field of NLP." This brief article summarizes the invited papers and strategic planning discussions of the workshop.
This article surveys the use of empirical, machinelearning methods for a particular natural language-understanding task--information extraction. The author presents a generic architecture for information-extraction systems and then surveys the learning algorithms that have been developed to address the problems of accuracy, portability, and knowledge acquisition for each component of the architecture. Author Eugene Charniak and coauthors Ng Hwee Tou and John Zelle, for example, describe techniques for part-of-speech tagging, parsing, and word-sense disambiguation. These techniques were created with no specific domain or high-level language-processing task in mind. In contrast, my article surveys the use of empirical methods for a particular natural language-understanding task that is inherently domain specific.
In recent years, there has been a flurry of research into empirical, corpus-based learning approaches to natural language processing (NLP). Most empirical NLP work to date has focused on relatively low-level language processing such as part-ofspeech tagging, text segmentation, and syntactic parsing. The success of these approaches has stimulated research in using empirical learning techniques in other facets of NLP, including semantic analysis--uncovering the meaning of an utterance. This article is an introduction to some of the emerging research in the application of corpusbased learning techniques to problems in semantic interpretation. In particular, we focus on two important problems in semantic interpretation, namely, word-sense disambiguation and semantic parsing.
Stanford Unzversity Stanford, CA 94805 F'OUNDED EARLY IN 1983, the Center for the Study of Language and Information [CSLI] at Stanford University grew out of a longstanding collaboration between scientists at research laboratories in the Palo Alto area and the facult,y and students of several Stanford University departments and out of a need for an institutional focus for this work on natural and computer languages. At present, CSLI has 17 senior members and about as many associate members, from SRI International, Xerox PARC, Fairchild, and the Departments of Computer Science, Linguistics, and Philosophy at Stanford. Since the Center's research will overlap with the work of other researchers around the world, an important goal of CSLI is to initiate a major outreach, whereby members of CSLI both inform themselves of work done elsewhere and share their own results with others. Questions about CSLI or Program SL should be addressed to Elizabeth Macken, Assistant Director, CSLI, Ven-tura Hall, Stanford University, Stanford, CA 94305. This collection of projects aims at developing scientific theories of natural-language use consonant with our basic perspective on language users as finite information processors.