Country
Adaptive Mixture of Probabilistic Transducers
We introduce and analyze a mixture model for supervised learning of probabilistic transducers. We devise an online learning algorithm that efficiently infers the structure and estimates the parameters of each model in the mixture. Theoretical analysis and comparative simulations indicate that the learning algorithm tracks the best model from an arbitrarily large (possibly infinite) pool of models. We also present an application of the model for inducing a noun phrase recognizer.
Classifying Facial Action
Bartlett, Marian Stewart, Viola, Paul A., Sejnowski, Terrence J., Golomb, Beatrice A., Larsen, Jan, Hager, Joseph C., Ekman, Paul
The Facial Action Coding System, (FACS), devised by Ekman and Friesen (1978), provides an objective meanS for measuring the facial muscle contractions involved in a facial expression. In this paper, we approach automated facial expression analysis by detecting and classifying facial actions. We generated a database of over 1100 image sequences of 24 subjects performing over 150 distinct facial actions or action combinations. We compare three different approaches to classifying the facial actions in these images: Holistic spatial analysis based on principal components of graylevel images; explicit measurement of local image features such as wrinkles; and template matching with motion flow fields. On a dataset containing six individual actions and 20 subjects, these methods had 89%, 57%, and 85% performances respectively for generalization to novel subjects. When combined, performance improved to 92%.
Control of Selective Visual Attention: Modeling the "Where" Pathway
Intermediate and higher vision processes require selection of a subset of the available sensory information before further processing. of a spatiallyUsually, this selection is implemented in the form circumscribed region of the visual field, the so-called "focus of attention" which scans the visual scene dependent on the input and of the subject. We here present a model foron the attentional state of the focus of attention in primates, based on a saliencythe control This mechanism is not only expected to model the functionalitymap. of biological vision but also to be essential for the understanding of complex scenes in machine vision.
Unsupervised Pixel-prediction
When a sensory system constructs a model of the environment from its input, it might need to verify the model's accuracy. One method of verification is multivariate time-series prediction: a good model could predict the near-future activity of its inputs, much data. Such a predictingas a good scientific theory predicts future to comparemodel would require copious top-down connections the input. That feedback could improve thethe predictions with model's performance in two ways: by biasing internal activity toward expected patterns, and by generating specific error signals if the predictions fail. A proof-of-concept model-an event-driven, computationally efficient layered network, incorporating "cortical" features like all-excitatory synapses and local inhibition-was constructed to make near-future predictions of a simple, moving stimulus.
Rapid Quality Estimation of Neural Network Input Representations
Cherkauer, Kevin J., Shavlik, Jude W.
However, ANNs are usually costly to train, preventing one from trying many different representations. In this paper, we address this problem by introducing and evaluating three new measures for quickly estimating ANN input representation quality. Two of these, called [DBleaves and Min (leaves), consistently outperform Rendell and Ragavan's (1993) blurring measure in accurately ranking different input representations for ANN learning on three difficult, real-world datasets.
Stable Fitted Reinforcement Learning
We describe the reinforcement learning problem, motivate algorithms whichseek an approximation to the Q function, and present new convergence results for two such algorithms. 1 INTRODUCTION AND BACKGROUND Imagine an agent acting in some environment. At time t, the environment is in some state Xt chosen from a finite set of states. The agent perceives Xt, and is allowed to choose an action at from some finite set of actions. Meanwhile, the agent experiences a real-valued cost Ct, chosen from a distribution which also depends only on Xt and at and which has finite mean and variance. Such an environment is called a Markov decision process, or MDP.
Optimizing Cortical Mappings
Goodhill, Geoffrey J., Finch, Steven, Sejnowski, Terrence J.
"Topographic" mappings occur frequently in the brain. A popular approachto understanding the structure of such mappings is to map points representing input features in a space of a few dimensions to points in a 2 dimensional space using some selforganizing algorithm.We argue that a more general approach may be useful, where similarities between features are not constrained tobe geometric distances, and the objective function for topographic matching is chosen explicitly rather than being specified implicitlyby the self-organizing algorithm. We investigate analytically an example of this more general approach applied to the structure of interdigitated mappings, such as the pattern of ocular dominance columns in primary visual cortex. 1 INTRODUCTION A prevalent feature of mappings in the brain is that they are often "topographic". In the most straightforward case this simply means that neighbouring points on a two-dimensional sheet (e.g. the retina) are mapped to neighbouring points in a more central two-dimensional structure (e.g. the optic tectum). However a more complex case, still often referred to as topographic, is the mapping from an abstract space of features (e.g.
Bayesian Methods for Mixtures of Experts
Waterhouse, Steve R., MacKay, David, Robinson, Anthony J.
Tel: [ 44] 1223 332815 ajr@eng.cam.ac.uk ABSTRACT We present a Bayesian framework for inferring the parameters of a mixture of experts model based on ensemble learning by variational freeenergy minimisation. The Bayesian approach avoids the over-fitting and noise level underestimation problems of traditional maximum likelihood inference. We demonstrate these methods on artificial problems and sunspot time series prediction. INTRODUCTION The task of estimating the parameters of adaptive models such as artificial neural networks using Maximum Likelihood (ML) is well documented ego Geman, Bienenstock & Doursat (1992). ML estimates typically lead to models with high variance, a process known as "over-fitting".
The Capacity of a Bump
Recently, several researchers have reported encouraging experimental results whenusing Gaussian or bump-like activation functions in multilayer perceptrons. Networks of this type usually require fewer hidden layers and units and often learn much faster than typical sigmoidal networks. To explain these results we consider a hyper-ridge network, which is a simple perceptron with no hidden units and a rid¥e activation function. If we are interested in partitioningp points in d dimensions into two classes then in the limit as d approaches infinity the capacity of a hyper-ridge and a perceptron is identical.
REMAP: Recursive Estimation and Maximization of A Posteriori Probabilities - Application to Transition-Based Connectionist Speech Recognition
Konig, Yochai, Bourlard, Hervé, Morgan, Nelson
In this paper, we introduce REMAP, an approach for the training and estimation of posterior probabilities using a recursive algorithm that is reminiscent of the EMbased Forward-Backward (Liporace 1982) algorithm for the estimation of sequence likelihoods. Although verygeneral, the method is developed in the context of a statistical model for transition-based speech recognition using Artificial NeuralNetworks (ANN) to generate probabilities for Hidden Markov Models (HMMs). In the new approach, we use local conditional posterior probabilities of transitions to estimate global posterior probabilities of word sequences. Although we still use ANNs to estimate posterior probabilities, the network is trained with targets that are themselves estimates of local posterior probabilities. Aninitial experimental result shows a significant decrease in error-rate in comparison to a baseline system. 1 INTRODUCTION The ultimate goal in speech recognition is to determine the sequence of words that has been uttered.