Context-free grammars cannot be identified in the limit from positive examples (Gold 1967), yet natural language grammars are more powerful than context-free grammars and humans learn them with remarkable ease from positive examples (Marcus 1993). Identifiability results for formal languages ignore a potentially powerful source of information available to learners of natural languages, namely, meanings. This paper explores the learnability of syntax (i.e.
The application of stochastic context-free grammars to the determination of RNA foldings allows a simple description of the subclass of sought secondary structures, but it needs efficient parsing algorithms. The more classic thermodynamic model of folding, popularized by Zuker under the framework of dynamic programming algorithms, allows an easy computation of foldings but its use is delicate when constraints have to be introduced on sought secondary structures. We show here that S-attribute grammars unify these two models and we introduce a parsing algorithm whose efficiency enables us to handle problems until then too difficult or too large to deal with. As a matter of fact, our algorithm is as efficient as a standard dynamic programming one when applied to the thermodynamic model (yet it offers a greater flexibility for the expression of constraints) and it is faster and saves more space than other parsing algorithms used so fax for stochastic grammars. Introduction In RNA, interactions between nucleotides form base pairs and, seen at a higher level, characteristic secondary structure motifs such as helices, loops and buldges. When multiple RNA sequences must be aligned, both primary structure and secondary structure need to be considered since elucidation of common folding patterns may indicate some pertinent regions to be aligned and vice versa (Sankoff 1985). Several methods have been established for predicting RNA secondary structure. The first method is phylogenetic analysis of homologous RNA molecules.
Children are facile at both discovering word boundaries and using those words to build higher-level structures in tandem. Current research treats lexical acquisition and grammar induction as two distinct tasks. Doing so has led to unreasonable assumptions. Existing work in grammar induction presupposes a perfectly segmented, noise-free lexicon, while lexical learning approaches largely ignore how the lexicon is used. This paper combines both tasks in a novel framework for bootstrapping lexical acquisition and grammar induction.
ABSTRACT In modern user interfaces, graphics play an important role in the communication between human and computer. When a person employs text and graphic objects in communication, those objects have meaning under a system of interpretation, or "visual language." The research described in this paper aims at spatially parsing expressions in formal visual languages to recover their underlying syntactic structure. Such "spatial parsing" allows a general purpose graphics editor to be used as a visual language interface, giving the user the freedom to first simply create some text and graphics, and The task of spatial parsing can be simplified for the interface designer/implementer through the use of visual grammars. For each of the four formal visual languages described in this paper, there is a specifiable set of spatial arrangements of elements for well-formed visual expressions in that language. Visual Grammar Notation is a way to describe those sets of spatial arrangements; the context-free grammars expressed in this notation are not only visual, but also machinereadable, and are used directly to guide the parsing.
This paper presents a method for inducing transformation rules that map natural-language sentences into a formal query or command language. The approach assumes a formal grammar for the target representation language and learns transformation rules that exploit the non-terminal symbols in this grammar. The learned transformation rules incrementally map a natural-language sentence or its syntactic parse tree into a parse-tree for the target formal language. Experimental results are presented for two corpora, one which maps English instructions into an existing formal coaching language for simulated RoboCup soccer agents, and another which maps English U.S.-geography questions into a database query language. We show that our method performs overall better and faster than previous approaches in both domains.