Machine Learning
Hybrid NN/HMM-Based Speech Recognition with a Discriminant Neural Feature Extraction
Willett, Daniel, Rigoll, Gerhard
In this paper, we present a novel hybrid architecture for continuous speech recognition systems. It consists of a continuous HMM system extended by an arbitrary neural network that is used as a preprocessor that takes several frames of the feature vector as input to produce more discriminative featurevectors with respect to the underlying HMM system. This hybrid system is an extension of a state-of-the-art continuous HMM system, andin fact, it is the first hybrid system that really is capable ofoutperforming thesestandard systems with respect to the recognition accuracy. Experimental results show an relative error reduction of about 10% that we achieved on a remarkably good recognition system based on continuous HMMsfor the Resource Management 1OOO-word continuous speech recognition task.
Using Helmholtz Machines to Analyze Multi-channel Neuronal Recordings
Sa, Virginia R. de, DeCharms, R. Christopher, Merzenich, Michael
One of the current challenges to understanding neural information processing in biological systems is to decipher the "code" carried by large populations of neurons acting in parallel. We present an algorithm for automated discovery of stochastic firing patterns in large ensembles of neurons. The algorithm, from the "Helmholtz Machine" family, attempts to predict the observed spike patterns in the data. The model consists of an observable layer which is directly activated by the input spike patterns, and hidden units that are activated throughascending connections from the input layer. The hidden unit activity can be propagated down to the observable layer to create a prediction of the data pattern that produced it.
From Regularization Operators to Support Vector Kernels
Smola, Alex J., Schรถlkopf, Bernhard
Support Vector (SV) Machines for pattern recognition, regression estimation and operator inversion exploit the idea of transforming into a high dimensional feature space where they perform a linear algorithm. Instead of evaluating this map explicitly, one uses Hilbert Schmidt Kernels k(x, y) which correspond to dot products of the mapped data in high dimensional space, i.e. k(x,y) ( I (x) ยท I (y)) (I) with I: .!Rn --*:F denoting the map into feature space. Mostly, this map and many of its properties are unknown. Even worse, so far no general rule was available.
Data-Dependent Structural Risk Minimization for Perceptron Decision Trees
Shawe-Taylor, John, Cristianini, Nello
Using displays of line orientations taken from Wolfe's experiments [1992], we study the hypothesis that the distinction between parallel versus serial processes arises from the availability of global information in the internal representations of the visual scene. The model operates in two phases. First, the visual displays are compressed via principal-component-analysis. Second, the compressed data is processed by a target detector module inorder to identify the existence of a target in the display. Our main finding is that targets in displays which were found experimentally tobe processed in parallel can be detected by the system, while targets in experimentally-serial displays cannot. This fundamental difference is explained via variance analysis of the compressed representations, providing a numerical criterion distinguishing parallelfrom serial displays. Our model yields a mapping of response-time slopes that is similar to Duncan and Humphreys's "search surface" [1989], providing an explicit formulation of their intuitive notion of feature similarity. It presents a neural realization ofthe processing that may underlie the classical metaphorical explanations of visual search.
Stacked Density Estimation
Smyth, Padhraic, Wolpert, David
The component gj's are usually relatively simple unimodal densities such as Gaussians. Density estimation with mixtures involves finding the locations, shapes, and weights of the component densities from the data (using for example the Expectation-Maximization (EM) procedure). Kernel density estimation canbe viewed as a special case of mixture modeling where a component is centered at each data point, given a weight of 1/N, and a common covariance structure (kernel shape) is estimated from the data. The quality of a particular probabilistic model can be evaluated by an appropriate scoring rule on independent out-of-sample data, such as the test set log-likelihood (also referred to as the log-scoring rule in the Bayesian literature).
A Solution for Missing Data in Recurrent Neural Networks with an Application to Blood Glucose Prediction
Tresp, Volker, Briegel, Thomas
Volker Tresp and Thomas Briegel * Siemens AG Corporate Technology Otto-Hahn-Ring 6 81730 Miinchen, Germany Abstract We consider neural network models for stochastic nonlinear dynamical systems where measurements of the variable of interest are only available atirregular intervals i.e. most realizations are missing. Difficulties arise since the solutions for prediction and maximum likelihood learning withmissing data lead to complex integrals, which even for simple cases cannot be solved analytically. In this paper we propose a specific combinationof a nonlinear recurrent neural predictive model and a linear error model which leads to tractable prediction and maximum likelihood adaptation rules. In particular, the recurrent neural network can be trained using the real-time recurrent learning rule and the linear error model can be trained by an EM adaptation rule, implemented using forward-backwardKalman filter equations. The model is applied to predict the glucose/insulin metabolism of a diabetic patient where blood glucose measurements are only available a few times a day at irregular intervals.
Boltzmann Machine Learning Using Mean Field Theory and Linear Response Correction
Kappen, Hilbert J., Ortiz, Francisco de Borja Rodrรญguez
We present a new approximate learning algorithm for Boltzmann Machines, using a systematic expansion of the Gibbs free energy to second order in the weights. The linear response correction to the correlations is given by the Hessian of the Gibbs free energy. The computational complexity of the algorithm is cubic in the number of neurons. We compare the performance of the exact BM learning algorithm with first order (Weiss) mean field theory and second order (TAP) mean field theory. The learning task consists of a fully connected Ising spin glass model on 10 neurons. We conclude that 1) the method works well for paramagnetic problems 2) the TAP correction gives a significant improvement over the Weiss mean field theory, both for paramagnetic and spin glass problems and 3) that the inclusion of diagonal weights improves the Weiss approximation for paramagnetic problems, but not for spin glass problems.
A Simple and Fast Neural Network Approach to Stereovision
A neural network approach to stereovision is presented based on aliasing effects of simple disparity estimators and a fast coherencedetection scheme.Within a single network structure, a dense disparity map with an associated validation map and, additionally, the fused cyclopean view of the scene are available. The network operations are based on simple, biological plausible circuitry; the algorithm is fully parallel and non-iterative. 1 Introduction Humans experience the three-dimensional world not as it is seen by either their left or right eye, but from a position of a virtual cyclopean eye, located in the middle between the two real eye positions. The different perspectives between the left and right eyes cause slight relative displacements of objects in the two retinal images (disparities), which make a simple superposition of both images without diplopia impossible. Proper fusion of the retinal images into the cyclopean view requires the registration of both images to a common coordinate system, which in turn requires calculation of disparities for all image areas which are to be fused.
Using Expectation to Guide Processing: A Study of Three Real-World Applications
In many real world tasks, only a small fraction of the available inputs are important at any particular time. This paper presents a method for ascertaining the relevance of inputs by exploiting temporal coherence and predictability. The method proposed inthis paper dynamically allocates relevance to inputs by using expectations of their future values. As a model of the task is learned, the model is simultaneously extendedto create task-specific predictions of the future values of inputs. Inputs which are either not relevant, and therefore not accounted for in the model, or those which contain noise, will not be predicted accurately. These inputs can be de-emphasized, and, in turn, a new, improved, model of the task created.