The Computational Linguistics of Biological Sequences

Classics (Collection 2)

Shortly after Watson and Crick's discovery of the structure of DNA, and at about the same time that the genetic code and the essential facts of gene expression were being elucidated, the field of linguistics was being similarly revolutionized by the work of Noam Chomsky [Chomsky, 1955, 1957, 1959, 1963, 1965]. Observing that a seemingly infinite variety of language was available to individual human beings based on clearly finite resources and experience, he proposed a formal representation of the rules or syntax of language, called generative grammar, that could provide finite--indeed, concise--characterizations of such infinite languages. Just as the breakthroughs in molecular biology in that era served to anchor genetic concepts in physical structures and opened up entirely novel experimental paradigms, so did Chomsky's insight serve to energize the field of linguistics, with putative correlates of cognitive processes that could for the first time be reasoned about 48 ARTIFICIAL INTELLIGENCE & MOLECULAR BIOLOGY While Chomsky and his followers built extensively upon this foundation in the field of linguistics, generative grammars were also soon integrated into the framework of the theory of computation, and in addition now form the basis for efforts of computational linguists to automate the processing and understanding of human language. Since it is quite commonly asserted that DNA is a richly-expressive language for specifying the structures and processes of life, also with the potential for a seemingly infinite variety, it is surprising that relatively little has been done to apply to biological sequences the extensive results and methods developed over the intervening decades in the field of formal language theory. While such an approach has been proposed [Brendel and Busse, 1984], most investigations along these lines have used grammar formalisms as tools for what are essentially information-theoretic studies [Ebeling and Jimenez-Montano, 1980; Jimenez-Montano, 1984], or have involved statistical analyses at the level of vocabularies (reflecting a more traditional notion of comparative linguistics) [Brendel et al., 1986; Pevzner et al., 1989a,b; Pietrokovski et al., 1990].

Artificial Intelligence: A General Survey (The Lighthill Report)


In forming such a view the Council has available to it a great deal of specialist information through its structure of Boards and Committees-- particularly from the Engineering Board and its Computing Science Committee and from the Science Board and its Biological Sciences Committee. To supplement the important mass of specialist and detailed information available to the Science Research Council, its Chairman decided to commission an independent report by someone outside the Al eld but with substantial general experience of research work in multidisciplinary elds including elds with mathematical, engineering and biological aspects. Such a personal view of the subject might be helpful to other lay persons such as Council members in the process of preparing to study specialist reports and recommendations and working towards detailed policy formation and decision taking. In scientic applications, there is a similar look beyond conventional data processing to the problems involved in large-scale data banking and retrieval, The vast eld of chemical compounds is one which has lent itself to ingenious and eective programs for data storage and retrieval and for the inference of chemical structure from mass-spec- trometry and other data.

The technology chess program


A chess program has been developed which plays good chess (for a program) using a very simple structure. It is based on a brute force search of the move tree with no forward pruning, using material as the only terminal evaluation function, and using a limited positional analysis at the top level for a tiebreak between moves which are materially equal. Because of the transparent structure, this program is proposed as a technological benchmark for chess programs which will continue to improve as computer technology increases.

Description and theoretical analysis (using schemata) of PLANNER, a language for proving theorems and manipulating models in a robot


Abstract: PLANNER is a formalism for proving theorems and manipulating models in a robot. The formalism is built out of a number of problem-solving primitives together with a hierarchical multiprocess backtrack control structure. Under BACKTRACK control structure, the hierarchy of activations of functions previously executed is maintained so that it is possible to revert to any previous state. In addition PLANNER uses multiprocessing so that there can be multiple loci of control over the problem-solving.

Mathematical and computational models of transformational grammar


We were led to this comparison by the observation that the computer model is weaker in three important ways: search depth is not unbounded, structures matching variables cannot be compared, and structures matching variables cannot be moved. Thus, every recursively enumerable language is generated by a transformational grammar with limited search depth, without equality comparisons of variables, and without moving structures corresponding to variables. On the other hand, both mathematical models allow unbounded depth of analysis; both allow equality comparisons of variables, although the Ginsburg-Partee model.compares

Human problem solving


The aim of the book is to advance the understanding of how humans think. It seeks to do so by putting forth a theory of human problem solving, along with a body of empirical evidence that permits assessment of the theory.Englewood Cliffs, N.J.: Prentice-Hall

QA4: A procedural calculus for intuitive reasoning


Abstract: This report presents a language, called QA4, designed to facilitate the construction of problem-solving systems used for robot planning, theorem proving, and automatic program synthesis and verification. Thus it provides many useful programming aids. More importantly, however, it provides a semantic framework for common sense reasoning about these problem domains. The interpreter for the language is extraordinarily general, and is therefore an adaptable tool for developing the specialized techniques of intuitive, symbolic reasoning used by the intelligent systems.

Some new directions in robot problem solving


For the past several years research on robot problem-solving methods has centered on what may one day be called'simple' plans: linear sequences of actions to be performed by single robots to achieve single goals in static environments. This process of forming new subgoals and new states continues until a state is produced in which the original goal is provable; the sequence of operators producing that state is the desired solution. In the case of a single goal wff, the objective is quite simple: achieve the goal (possibly while minimizing some combination of planning and execution cost). The objective of the system is to achieve the single positive goal (perhaps while minimizing search and execution costs) while avoiding absolutely any state satisfying the negative goal.

Learning and executing generalized robot plans


"In this paper we describe some major new additions to the STRIPS robot problem-solving system. The first addition is a process for generalizing a plan produced by STRIPS so that problem-specific constants appearing in the plan are replaced by problem-independent parameters.The generalized plan, stored in a convenient format called a triangle table, has two important functions. The more obvious function is as a single macro action that can be used by STRIPS—either in whole or in part—during the solution of a subsequent problem. Perhaps less obviously, the generalized plan also plays a central part in the process that monitors the real-world execution of a plan, and allows the robot to react "intelligently" to unexpected consequences of actions.We conclude with a discussion of experiments with the system on several example problems."Artificial Intelligence 3:251-288

Some techniques for proving correctness of programs which alter data structures


We will extend Floyd's proof system for flow diagrams to handle commands Which process lists. McCarthy and Painter (1967) deal with arrays by introducing'change' and'access' functions so as to write a[i]: a[j] 1 as a: change (a, i, access 24 BURSTALL King (1969) in mechanising Floyd's technique gives a method for such assignments which, however, introduces case analysis that sometimes becomes unwieldy. Let us recall briefly the technique of Floyd (1967) for proving correctness of programs in flow diagram form. We will here retain the inductive method of Floyd for dealing with flow diagrams containing loops, but give methods for coping with more complex kinds of assignment command.