If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."
However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …
We present an algorithm for creating a neural network which produces accurateprobability estimates as outputs. The network implements aGibbs probability distribution model of the training database. This model is created by a new transformation relating the joint probabilities of attributes in the database to the weights (Gibbs potentials) of the distributed network model. The theory of this transformation is presented together with experimental results. Oneadvantage of this approach is the network weights are prescribed without iterative gradient descent. Used as a classifier the network tied or outperformed published results on a variety of databases.
Genevieve B. Orr and Todd K. Leen Department of Computer Science and Engineering Oregon Graduate Institute of Science & Technology 19600 N.W. von Neumann Drive Beaverton, OR 97006-1999 Abstract In stochastic learning, weights are random variables whose time evolution is governed by a Markov process. We summarize the theory of the time evolution of P, and give graphical examples of the time evolution that contrast the behavior of stochastic learning with true gradient descent (batch learning). Finally, we use the formalism to obtain predictions of the time required for noise-induced hopping between basins of different optima. We compare the theoretical predictions with simulations of large ensembles of networks for simple problems in supervised and unsupervised learning. Despite the recent application of convergence theorems from stochastic approximation theoryto neural network learning (Oja 1982, White 1989) there remain outstanding questionsabout the search dynamics in stochastic learning.
We present the information-theoretic derivation of a learning algorithm that clusters unlabelled data with linear discriminants. In contrast to methods that try to preserve information about the input patterns, we maximize the information gained from observing the output of robust binary discriminators implemented with sigmoid nodes. We deri ve a local weight adaptation rule via gradient ascent in this objective, demonstrate its dynamics on some simple data sets, relate our approach to previous work and suggest directions in which it may be extended.
In visual processing the ability to deal with missing and noisy information iscrucial. Occlusions and unreliable feature detectors often lead to situations where little or no direct information about features is available. Howeverthe available information is usually sufficient to highly constrain the outputs. We discuss Bayesian techniques for extracting class probabilities given partial data. The optimal solution involves integrating overthe missing dimensions weighted by the local probability densities. We show how to obtain closed-form approximations to the Bayesian solution using Gaussian basis function networks.
The classical computational model for stereo vision incorporates a uniqueness inhibition constraint to enforce a one-to-one feature match, thereby sacrificing the ability to handle transparency. Critics ofthe model disregard the uniqueness constraint and argue that the smoothness constraint can provide the excitation support required for transparency computation.
We present a model of how MT cells aggregate responses from VI to form such a velocity representation. Two different sets of units, with local receptive fields, receive inputs from motion energy filters. One set of units forms estimates of local motion, while the second set computes the utility of these estimates. Outputs from this second set of units "gate" the outputs from the first set through a gain control mechanism. This active process for selecting only a subset of local motion responses to integrate into more global responses distinguishes our model from previous models of velocity estimation.
The invariance of an objects' identity as it transformed over time provides a powerful cue for perceptual learning. We present an unsupervised learningprocedure which maximizes the mutual information between the representations adopted by a feed-forward network at consecutive time steps. We demonstrate that the network can learn, entirely unsupervised, to classify an ensemble of several patterns by observing pattern trajectories, even though there are abrupt transitions from one object to another between trajectories. Thesame learning procedure should be widely applicable to a variety of perceptual learning tasks. 1 INTRODUCTION A promising approach to understanding human perception is to try to model its developmental stages. There is ample evidence that much of perception is learned.
First, the membership functions and an initial rule representation are learned; second, the rules are compressed as much as possible using information theory; and finally, a computational networkis constructed to compute the function value. This system is applied to two control examples: learning the truck and trailer backer-upper control system, and learning a cruise control systemfor a radio-controlled model car. 1 Introduction Function approximation is the problem of estimating a function from a set of examples ofits independent variables and function value. If there is prior knowledge of the type of function being learned, a mathematical model of the function can be constructed and the parameters perturbed until the best match is achieved. However, ifthere is no prior knowledge of the function, a model-free system such as a neural network or a fuzzy system may be employed to approximate an arbitrary nonlinear function. A neural network's inherent parallel computation is efficient for speed; however, the information learned is expressed only in the weights of the network. The advantage of fuzzy systems over neural networks is that the information learnedis expressed in terms of linguistic rules. In this paper, we propose a method for learning a complete fuzzy system to approximate example data.
The action network is embedded in a sensorymotoric systemarchitecture that contains a separate world model. It is continuously fed with short-term predicted spatiotemporal obstacle trajectories, and receives robot state feedback. The action netallows for external switching between alternative planning tasks.It generates goal-directed motor actions - subject to the robot's kinematic and dynamic constraints - such that collisions withmoving obstacles are avoided. Using supervised learning, we distribute examples of the optimal planner mapping over a structure-level adapted parsimonious higher order network. The training database is generated by a Dynamic Programming algorithm. Extensivesimulations reveal, that the local planner mapping is highly nonlinear, but can be effectively and sparsely represented bythe chosen powerful net model. Excellent generalization occurs for unseen obstacle configurations. We also discuss the limitations offeed-forward neurocontrol for growing planning horizons.