Country
Stable Dynamic Parameter Adaption
A stability criterion for dynamic parameter adaptation is given. In the case of the learning rate of backpropagation, a class of stable algorithms is presented and studied, including a convergence proof. 1 INTRODUCTION All but a few learning algorithms employ one or more parameters that control the quality of learning. Backpropagation has its learning rate and momentum parameter; Boltzmannlearning uses a simulated annealing schedule; Kohonen learning a learning rate and a decay parameter; genetic algorithms probabilities, etc. The investigator always has to set the parameters to specific values when trying to solve a certain problem. Traditionally, the metaproblem of adjusting the parameters is solved by relying on a set of well-tested values of other problems or an intensive search for good parameter regions by restarting the experiment with different values. Inthis situation, a great deal of expertise and/or time for experiment design is required (as well as a huge amount of computing time).
A Framework for Non-rigid Matching and Correspondence
Pappu, Suguna, Gold, Steven, Rangarajan, Anand
Matching feature point sets lies at the core of many approaches to object recognition. We present a framework for nonrigid matching thatbegins with a skeleton module, affine point matching, and then integrates multiple features to improve correspondence and develops an object representation based on spatial regions to model local transformations.
Learning to Predict Visibility and Invisibility from Occlusion Events
Marshall, Jonathan A., Alley, Richard K., Hubbard, Robert S.
This paper presents a self-organizing neural network that learns to detect, represent, and predict the visibility and invisibility relationships that arise during occlusion events, after a period of exposure to motion sequences containing occlusion and disocclusion events. The network develops two parallel opponent channels or "chains" of lateral excitatory connections for every resolvable motion trajectory. One channel, the "On" chain or "visible" chain, is activated when a moving stimulus is visible. The other channel, the "Off" chain or "invisible" chain, carries a persistent, amodal representation that predicts the motion of a formerly visible stimulus that becomes invisible due to occlusion. The learning rule uses disinhibition from the On chain to trigger learning in the Off chain.
Neural Networks with Quadratic VC Dimension
Koiran, Pascal, Sontag, Eduardo D.
A set of labeled training samples is provided, and a network must be obtained which is then expected to correctly classify previously unseen inputs. In this context, a central problem is to estimate the amount of training data needed to guarantee satisfactory learning performance. To study this question, it is necessary to first formalize the notion of learning from examples. One such formalization is based on the paradigm of probably approximately correct (PAC) learning, due to Valiant (1984). In this framework, one starts by fitting some function /, chosen from a predetermined class F, to the given training data. The class F is often called the "hypothesis class", and for purposes of this discussion it will be assumed that the functions in F take binary values {O, I} and are defined on a common domain X.
From Isolation to Cooperation: An Alternative View of a System of Experts
Schaal, Stefan, Atkeson, Christopher G.
We introduce a constructive, incremental learning system for regression problems that models data by means of locally linear experts. In contrast to other approaches, the experts are trained independently and do not compete for data during learning. Only when a prediction for a query is required do the experts cooperate by blending their individual predictions. Eachexpert is trained by minimizing a penalized local cross validation errorusing second order methods. In this way, an expert is able to find a local distance metric by adjusting the size and shape of the receptive fieldin which its predictions are valid, and also to detect relevant input features by adjusting its bias on the importance of individual input dimensions. We derive asymptotic results for our method. In a variety of simulations the properties of the algorithm are demonstrated with respect to interference, learning speed, prediction accuracy, feature detection, and task oriented incremental learning.
Discovering Structure in Continuous Variables Using Bayesian Networks
Hofmann, Reimar, Tresp, Volker
We study Bayesian networks for continuous variables using nonlinear conditionaldensity estimators. We demonstrate that useful structures can be extracted from a data set in a self-organized way and we present sampling techniques for belief update based on Markov blanket conditional density models. 1 Introduction One of the strongest types of information that can be learned about an unknown process is the discovery of dependencies and -even more important-of independencies. Asuperior example is medical epidemiology where the goal is to find the causes of a disease and exclude factors which are irrelevant.
Learning Fine Motion by Markov Mixtures of Experts
Meila, Marina, Jordan, Michael I.
Compliant control is a standard method for performing fine manipulation tasks,like grasping and assembly, but it requires estimation of the state of contact (s.o.c.) between the robot arm and the objects involved.Here we present a method to learn a model of the movement from measured data. The method requires little or no prior knowledge and the resulting model explicitly estimates the s.o.c. The current s.o.c. is viewed as the hidden state variable of a discrete HMM. The control dependent transition probabilities between states are modeled as parametrized functions of the measurement. Weshow that their parameters can be estimated from measurements at the same time as the parameters of the movement in each s.o.c.
A Neural Network Classifier for the I100 OCR Chip
Platt, John C., Allen, Timothy P.
Therefore, we want c to be less than 0.5. In order to get a 2:1 margin, we choose c 0.25. The classifier is trained only on individual partial characters instead of all possible combinations of partial characters. Therefore, we can specify the classifier using only 1523 constraints, instead of creating a training set of approximately 128,000 possible combinations of partial characters. Applying these constraints is therefore much faster than back-propagation on the entire data set.
A Model of Auditory Streaming
McCabe, Susan L., Denham, Michael J.
The formation of associations between signals, which are considered to arise from the same external source, allows the organism to recognise significant patterns and relationships within the signals from each source without being confused by accidental coincidences between unrelated signals (Bregman, 1990). The intrinsically temporal nature of sound means that in addition to being able to focus on the signal of interest, perhaps of equal significance, is the ability to predict how that signal is expected to progress; such expectations can then be used to facilitate further processing of the signal. It is important to remember that perception is a creative act (Luria, 1980). The organism creates its interpretation of the world in response to the current stimuli, within the context of its current state of alertness, attention, and previous experience. The creative aspects of perception are exemplified in the auditory system where peripheral processing decomposes acoustic stimuli.