Plotting

Principles of Risk Minimization for Learning Theory

Neural Information Processing Systems

Learning is posed as a problem of function estimation, for which two principles of solution are considered: empirical risk minimization and structural risk minimization. These two principles are applied to two different statements of the function estimation problem: global and local. Systematic improvements in prediction power are illustrated in application to zip-code recognition.


Connectionist Optimisation of Tied Mixture Hidden Markov Models

Neural Information Processing Systems

Issues relating to the estimation of hidden Markov model (HMM) local probabilities are discussed. In particular we note the isomorphism of radial basis functions (RBF) networks to tied mixture density modellingj additionally we highlight the differences between these methods arising from the different training criteria employed. We present a method in which connectionist training can be modified to resolve these differences and discuss some preliminary experiments. Finally, we discuss some outstanding problems with discriminative training.


Networks for the Separation of Sources that are Superimposed and Delayed

Neural Information Processing Systems

We have created new networks to unmix signals which have been mixed either with time delays or via filtering. We first show that a subset of the Herault-Jutten learning rules fulfills a principle of minimum output power. We then apply this principle to extensions of the Herault-Jutten network which have delays in the feedback path. Our networks perform well on real speech and music signals that have been mixed using time delays or filtering.


A Computational Mechanism to Account for Averaged Modified Hand Trajectories

Neural Information Processing Systems

Using the double-step target displacement paradigm the mechanisms underlying arm trajectory modification were investigated. Using short (10-110 msec) inter-stimulus intervals the resulting hand motions were initially directed in between the first and second target locations. The kinematic features of the modified motions were accounted for by the superposition scheme, which involves the vectorial addition of two independent point-topoint motion units: one for moving the hand toward an internally specified location and a second one for moving between that location and the final target location. The similarity between the inferred internally specified locations and previously reported measured endpoints of the first saccades in double-step eye-movement studies may suggest similarities between perceived target locations in eye and hand motor control.


Threshold Network Learning in the Presence of Equivalences

Neural Information Processing Systems

This paper applies the theory of Probably Approximately Correct (PAC) learning to multiple output feedforward threshold networks in which the weights conform to certain equivalences. It is shown that the sample size for reliable learning can be bounded above by a formula similar to that required for single output networks with no equivalences. The best previously obtained bounds are improved for all cases.



Improved Hidden Markov Model Speech Recognition Using Radial Basis Function Networks

Neural Information Processing Systems

The RBF network consists of an input layer, a hidden layer composed of Gaussian basis functions, and an output layer. Connections from the input layer to the hidden layer are fixed at unity while those from the hidden layer to the output layer are trained by minimizing the overall mean-square error between actual and desired output values. Each RBF output node has a corresponding state in a set of HMM word models which represent the words in the vocabulary. HMM word models are left-to-right with no skip states and have a one-state background noise model at either end. The background noise models are identical for all words.


Oscillatory Model of Short Term Memory

Neural Information Processing Systems

It seems quite natural to assume that the limited capacity is due to the special dynamical nature of STM. Recently, Crick and Koch (1990) suggested that the working memory is functionally related to the binding process, and is obtained via synchronized oscillations of neural populations. The capacity limitation of STM may then result from the competition between oscillations representing items in STM. In the model which we investigate this is indeed the case.


Learning to Segment Images Using Dynamic Feature Binding

Neural Information Processing Systems

Despite the fact that complex visual scenes contain multiple, overlapping objects, people perform object recognition with ease and accuracy. One operation that facilitates recognition is an early segmentation process in which features of objects are grouped and labeled according to which object they belong. Current computational systems that perform this operation are based on predefined grouping heuristics.


A Network of Localized Linear Discriminants

Neural Information Processing Systems

The localized linear discriminant network (LLDN) has been designed to address classification problems containing relatively closely spaced data from different classes (encounter zones [1], the accuracy problem [2]). Locally trained hyperplane segments are an effective way to define the decision boundaries for these regions [3]. The LLD uses a modified perceptron training algorithm for effective discovery of separating hyperplane/sigmoid units within narrow boundaries. The basic unit of the network is the discriminant receptive field (DRF) which combines the LLD function with Gaussians representing the dispersion of the local training data with respect to the hyperplane. The DRF implements a local distance measure [4], and obtains the benefits of networks oflocalized units [5]. A constructive algorithm for the two-class case is described which incorporates DRF's into the hidden layer to solve local discrimination problems. The output unit produces a smoothed, piecewise linear decision boundary. Preliminary results indicate the ability of the LLDN to efficiently achieve separation when boundaries are narrow and complex, in cases where both the "standard" multilayer perceptron (MLP) and k-nearest neighbor (KNN) yield high error rates on training data. 1 The LLD Training Algorithm and DRF Generation The LLD is defined by the hyperplane normal vector V and its "midpoint" M (a translated origin [1] near the center of gravity of the training data in feature space).