Context-free grammars cannot be identified in the limit from positive examples (Gold 1967), yet natural language grammars are more powerful than context-free grammars and humans learn them with remarkable ease from positive examples (Marcus 1993). Identifiability results for formal languages ignore a potentially powerful source of information available to learners of natural languages, namely, meanings. This paper explores the learnability of syntax (i.e.
The application of stochastic context-free grammars to the determination of RNA foldings allows a simple description of the subclass of sought secondary structures, but it needs efficient parsing algorithms. The more classic thermodynamic model of folding, popularized by Zuker under the framework of dynamic programming algorithms, allows an easy computation of foldings but its use is delicate when constraints have to be introduced on sought secondary structures. We show here that S-attribute grammars unify these two models and we introduce a parsing algorithm whose efficiency enables us to handle problems until then too difficult or too large to deal with. As a matter of fact, our algorithm is as efficient as a standard dynamic programming one when applied to the thermodynamic model (yet it offers a greater flexibility for the expression of constraints) and it is faster and saves more space than other parsing algorithms used so fax for stochastic grammars. Introduction In RNA, interactions between nucleotides form base pairs and, seen at a higher level, characteristic secondary structure motifs such as helices, loops and buldges. When multiple RNA sequences must be aligned, both primary structure and secondary structure need to be considered since elucidation of common folding patterns may indicate some pertinent regions to be aligned and vice versa (Sankoff 1985). Several methods have been established for predicting RNA secondary structure. The first method is phylogenetic analysis of homologous RNA molecules.
Children are facile at both discovering word boundaries and using those words to build higher-level structures in tandem. Current research treats lexical acquisition and grammar induction as two distinct tasks. Doing so has led to unreasonable assumptions. Existing work in grammar induction presupposes a perfectly segmented, noise-free lexicon, while lexical learning approaches largely ignore how the lexicon is used. This paper combines both tasks in a novel framework for bootstrapping lexical acquisition and grammar induction.
Most pattern-recognition problems start from observations generated by some structured stochastic process. Probabilistic context-free grammars (PCFGs) (Gonzalez & Thomason 1978; Charniak 1993) have provided a useful method for modeling uncertainty in a wide range of structures, including programming languages (Wetherell 1980), images (Chou 1989), speech signals (Ney 1992), and RNA sequences (Sakakibara et al. 1995). Domains like plan recognition, where non-probabilistic grammars have provided useful models (Vilain 1990), may also benefit from an explicit stochastic model. Once we have created a PCFG model of a process, we can apply existing PCFG parsing algorithms to answer a variety of queries. However, these techniques are limited in the types of evidence they can exploit and the types of queries they can answer.
We present a relational learning framework for grammar induction that is able to learn meaning as well as syntax. We introduce a type of constraint-based grammar, lexicalized well-founded grammar (lwfg), and we prove that it can always be learned from a small set of semantically annotated examples, given a set of assumptions. The semantic representation chosen allows us to learn the constraints together with the grammar rules, as well as an ontology-based semantic interpretation. We performed a set of experiments showing that several fragments of natural language can be covered by a lwfg, and that it is possible to choose the representative examples heuristically, based on linguistic knowledge.