The Computational Linguistics of Biological Sequences

Classics (Collection 2)

Shortly after Watson and Crick's discovery of the structure of DNA, and at about the same time that the genetic code and the essential facts of gene expression were being elucidated, the field of linguistics was being similarly revolutionized by the work of Noam Chomsky [Chomsky, 1955, 1957, 1959, 1963, 1965]. Observing that a seemingly infinite variety of language was available to individual human beings based on clearly finite resources and experience, he proposed a formal representation of the rules or syntax of language, called generative grammar, that could provide finite--indeed, concise--characterizations of such infinite languages. Just as the breakthroughs in molecular biology in that era served to anchor genetic concepts in physical structures and opened up entirely novel experimental paradigms, so did Chomsky's insight serve to energize the field of linguistics, with putative correlates of cognitive processes that could for the first time be reasoned about 48 ARTIFICIAL INTELLIGENCE & MOLECULAR BIOLOGY While Chomsky and his followers built extensively upon this foundation in the field of linguistics, generative grammars were also soon integrated into the framework of the theory of computation, and in addition now form the basis for efforts of computational linguists to automate the processing and understanding of human language. Since it is quite commonly asserted that DNA is a richly-expressive language for specifying the structures and processes of life, also with the potential for a seemingly infinite variety, it is surprising that relatively little has been done to apply to biological sequences the extensive results and methods developed over the intervening decades in the field of formal language theory. While such an approach has been proposed [Brendel and Busse, 1984], most investigations along these lines have used grammar formalisms as tools for what are essentially information-theoretic studies [Ebeling and Jimenez-Montano, 1980; Jimenez-Montano, 1984], or have involved statistical analyses at the level of vocabularies (reflecting a more traditional notion of comparative linguistics) [Brendel et al., 1986; Pevzner et al., 1989a,b; Pietrokovski et al., 1990].

A look at biological and machine perception


Although it is convenient for experimental purposes to think of perception in stimulus-response terms, the immense contribution of stored data, required for prediction, makes us see perception as largely cognitive. Although there must be physiological mechanisms to carry out the cognitive logical processes, of generalising and selecting stored data, the concepts we need for understanding what the physiology is carrying out are not part of physiology. This makes parallel processing convenient for biological computing, and serial computing more convenient for man-made computers. If so, biological perception seems to demonstrate powers of parallel processing, while computers demonstrate very different powers of serial processing.

Perception, picture processing and computers


The machines (usually digital computers) will classify simple shapes and printed letters and digits represented (by means of a suitable television scanner) as a matrix of I s and Os. Psychologists concerned with analysing complex behaviour often have to select an appropriate level of description. Let us confine our attention to three levels: words, phrases, sentences. Figure 1 shows a set of rules subdivided into numbered groups (1, 2, 3,..., 6).

Principles of neurodynamics: Perceptrons and the theory of brain mechanisms


In Chapter 2, a brief review of the main alternative approaches to the development of brain models is presented. Chapter 4 contains basic definitions and some of the notation to be used in later sections are presented. Parts II and III are devoted to a summary of the established theoretical results obtained to date. Part II (Chapters 5 through 14) deals with the theory of three-layer series-coupled perceptrons, on which most work has been done to date.

A selected descriptor indexed bibliography to the literature on artificial intelligence


This listing is intended as an introduction to the literature on Artificial Intelligence, €”i.e., to the literature dealing with the problem of making machines behave intelligently. We have divided this area into categories and cross-indexed the references accordingly. Large bibliographies without some classification facility are next to useless. This particular field is still young, but there are already many instances in which workers have wasted much time in rediscovering (for better or for worse) schemes already reported. In the last year or two this problem has become worse, and in such a situation just about any information is better than none. This bibliography is intended to serve just that purpose-to present some information about this literature. The selection was confined mainly to publications directly concerned with construction of artificial problem-solving systems. Many peripheral areas are omitted completely or represented only by a few citations.IRE Trans. on Human Factors in Electronics, HFE-2, pages 39-55

Attitudes toward intelligent machines


This is an attempt to analyze attitudes and arguments brought forth by questions like "Can machines think?" and "Can machines exhibit intelligence?" Its purpose is to improve the climate which surrounds research in the field of machine or artificial intelligence. Its goal is not to convince those who answer the above questions negatively that they are wrong (although an attempt will be made to refute some of the negative arguments) but that they should be tolerant of research investigating these questions. The negative attitudes existent today tend to inhibit such research.Reprinted in Feigenbaum & Feldman, Computers and Thought (1963).Also in Datamation 9(3), March 1963, pp.34-38.Symposium on Bionics, Rand Technical Report 60 600, pp. 13-19

Man-Computer Symbiosis


Man-computer symbiosis is an expected development in cooperative interaction between men and electronic computers. It will involve very close coupling between the human and the electronic members of the partnership. The main aims are 1) to let computers facilitate formulative thinking as they now facilitate the solution of formulated problems, and 2) to enable men and computers to cooperate in making decisions and controlling complex situations without inflexible dependence on predetermined programs. In the anticipated symbiotic partnership, men will set the goals, formulate the hypotheses, determine the criteria, and perform the evaluations. Computing machines will do the routinizable work that must be done to prepare the way for insights and decisions in technical and scientific thinking. Preliminary analyses indicate that the symbiotic partnership will perform intellectual operations much more effectively than man alone can perform them. Prerequisites for the achievement of the effective, cooperative association include developments in computer time sharing, in memory components, in memory organization, in programming languages, and in input and output equipment.See also: ACM Digital Library citationIRE Transactions on Human Factors in Electronics, HFE-1, pp 4-11

Conditional probability computing in a nervous system


The design of classification computers is discussed in the first paper; the design of conditional probability computers Is discussed in a third paper (Uttley, 1958, ref. Nervous transmission is in terms of standard impulses which meet the requirements of binary classification. However, at low levels in nervous systems, intensity is signalled in terms of impulse frequency. If, at higher levels, patterns are distinguished by classification then intensity must not be signalled in terms of frequency but in terms of'place'.

Medical diagnosis and cybernetics


This ancient branch of knowledge is represented by the medical practitioner; we shall therefore establish the logical structure of Medicine by studying his activities. If we confine ourselves to the traditional system, the consultation consists of various parts, as follows: the questioning, the general examination, palpation, inspection, examination with instruments. Much emphasis is laid upon the value of a proper examination, a complete record of symptoms, palpation carried out gently and correctly, but no indication is given of the way in which all this material is put together. We shall call all the actions by which the doctor obtains information about his patient the "acquisition of information", which thus comprises the general examination, palpation, questioning the patient, special examinations: in brief, all the serdological and laboratory techniques.

The mechanism of habituation


W. ROSS ASHBY SUMMARY THE Phenomenon of habituation, in which the response to any regularly repeated stimulus decreases, has not so far received any general mechanistic explanation. It is here shown that when any system is subjected to a regularly repeated stimulus or disturbance, the successive responses, if they change in size, do not in general tend to become larger or smaller with equal probability: there is a fundamental bias in favour of the smaller. The atm of this paper is not, in any case, to relate habituation to concepts of physiological or psychological type such as "fatigue" but to relate it to basic concepts of mechanistic type. As will be shown below, the basic phenomenon of habituation can be identified over a very wide range of systems, and only a language that can range equally widely is appropriate.