Technology
PRODIGY: An integrated architecture for planning and learning
Carbonell, J. G. | Knoblock, C. A. | Minton, S.
Artificial intelligence has progressed to the point where multiple cognitive capabilities are being integrated into computational architectures, such as SOAR, PRODIGY, THEO, and ICARUS. Learning in PRODIGY occurs at all decision points and integration in PRODIGY is at the knowledge level; the learning and reasoning modules produce mutually interpretable knowledge structures. Issues in architectural design are discussed, providing a context to examine the underlying tenets of the PRODIGY architecture.
Engineering approach to building complete, intelligent beings
Rather than tackle isolated aspects of Rather than tackle isolated aspects of human-level intelligence the mobile robot group at MIT has been working bottom up trying to build complete insect-level intelligent systems for mobile robots. The robots are situated in ordinary people-populated office and laboratory areas and must go about their business in an unstructured dynamically changing environment. Traditional AI techniques make such unrealistic assumptions on the perceptual and actuation systems that they are not much use for such an endeavour. We have developed a different approach, based on task achieving behaviors, rather than information processing components, as the fundamental unit of reduction of a complete intelligent system. We have built a series of complete creatures (Allen, Herbert, Tom and Jerry, Genghis, and now Seymour under construction) which exist in and interact with the world.
Classifier systems and genetic algorithms
Booker, L. B. | Goldberg, D. E. | Holland, J. H.
ABSTRACT Classifier systems are massively parallel, message-passing, rule-based systems that learn through credit assignment (the bucket brigade algorithm) and rule discovery (the genetic algorithm). They typically operate in environments that exhibit one or more of the following characteristics: (1) perpetually novel events accompanied by large amounts of noisy or irrelevant data; (2) continual, often real-time, requirements for action; (3) implicitly or inexactly defined goals; and (4) sparse payoff or reinforcement obtainable only through long action sequences. Classifier systems are designed to absorb new information continuously from such environments, devising sets of compet- ing hypotheses (expressed as rules) without disturbing significantly capabilities already acquired. This paper reviews the definition, theory, and extant applications of classifier systems, comparing them with other machine learning techniques, and closing with a discussion of advantages, problems, and possible extensions of classifier systems. Artificial Intelligence, 40 (1-3), 235-82.
Learnability and the Vapnik-Chervonenkis dimension
Blumer, A. | Ehrenfeucht, A. | Haussler, D. | Warmuth, M.
In this paper we extend Valiant's model to learning concepts defined by regions in Euclidean n-dimensional space E", n 2 1. The general techniques we develop lead to new results in Boolean domains as well. Our methods are based on the pioneering work of Vapnik and Chervonenkis [6 I-631 on the distribution-free convergence of empirical probability estimates and its application to the theory of pattern recognition. These methods provide a unified treatment of some of Valiant's results, and extend previous results of Pearl [50, 5 I] and Devroye and Wagner ([ 151, see also [ 141) along with our results from [lo]. In learning a class C of concepts (e.g., subsets of E") from examples, a single target concept is selected from C and we are given a finite sequence of points in E", each labeled " 1" if it is in the target concept (a positive example) and "0" if it is not (a negative example). This set is called a sample of the target concept. A learning function for C is a function that, given a large enough randoml:y drawn sample of any target concept in C, returns a region in E" (a hypothesis) that is with high probability a good approximation to the target concept. More precisely: (1) We let P be a fixed probability distribution on E" and assume that examples are created by drawing points independently at random according to P. (2) The error of a hypothesis is taken to be the probability that it disagrees with the target concept on a randomly drawn example, that is, the error is just the probability (according to P) of the symmetric difference between the hypothesis and the target concept.
An English language question answering system for a large relational data base
By typing requests in English, casual users will be able to obtain explicit answers from a large relational database of aircraft flight and maintenance data using a system called PLANES. The design and implementation of this system is described and illustrated with detailed examples of the operation of system components and examples of overall system operation. The language processing portion of the system uses a number of augmented transition networks, each of which matches phrases with a specific meaning, along with context registers (history keepers) and concept case frames; these are used for judging meaningfulness of questions, generating dialogue for clarifying partially understood questions, and resolving ellipsis and pronoun reference problems. Other system components construct a formal query for the relational database, and optimize the order of searching relations. Methods are discussed for handling vague or complex questions and for providing browsing ability.
What size net gives valid generalization?
We address the question of when a network can be expected to generalize from m random training examples chosen from some arbitrary probability distribution, assuming that future test examples are drawn from the same distribution. Among our results are the following bounds on appropriate sample vs. network size. We show that if m O(W/ log N/) random examples can be loaded on a feedforward network of linear threshold functions with N nodes and W weights, so that at least a fraction 1 /2 of the examples are correctly classified, then one has confidence approaching certainty that the network will correctly classify a fraction 1 of future test examples drawn from the same distribution. Conversely, for fully-connected feedforward nets with one hidden layer, any learning algorithm using fewer than Ω(W/) random training examples will, for some distributions of examples consistent with an appropriate weight choice, fail at least some fixed fraction of the time to find a weight choice that will correctly classify more than a 1 fraction of the future test examples.
Graphical models for associations between variables, some of which are qualitative and some quantitative
The Annals of Statistics publishes research papers of the highest quality reflecting the many facets of contemporary statistics. Primary emphasis is placed on importance and originality, not on formalism. The discipline of statistics has deep roots in both mathematics and in substantive scientific fields. Mathematics provides the language in which models and the properties of statistical methods are formulated. It is essential for rigor, coherence, clarity and understanding. Consequently, our policy is to continue to play a special role in presenting research at the forefront of mathematical statistics, especially theoretical advances that are likely to have a significant impact on statistical methodology or understanding.