Learning Graphical Models
Tempo tracking and rhythm quantization by sequential Monte Carlo
Cemgil, Ali Taylan, Kappen, Bert
We present a probabilistic generative model for timing deviations in expressive music. The structure of the proposed model is equivalent to a switching state space model. We formulate twowell known music recognition problems, namely tempo tracking and automatic transcription (rhythm quantization) as filtering andmaximum a posteriori (MAP) state estimation tasks. The inferences are carried out using sequential Monte Carlo integration (particlefiltering) techniques. For this purpose, we have derived a novel Viterbi algorithm for Rao-Blackwellized particle filters, wherea subset of the hidden variables is integrated out.
Unsupervised Learning of Human Motion Models
Song, Yang, Goncalves, Luis, Perona, Pietro
This paper presents an unsupervised learning algorithm that can derive the probabilistic dependence structure of parts of an object (a moving human bodyin our examples) automatically from unlabeled data. The distinguished partof this work is that it is based on unlabeled data, i.e., the training features include both useful foreground parts and background clutter and the correspondence between the parts and detected features are unknown. We use decomposable triangulated graphs to depict the probabilistic independence of parts, but the unsupervised technique is not limited to this type of graph. In the new approach, labeling of the data (part assignments) is taken as hidden variables and the EM algorithm isapplied. A greedy algorithm is developed to select parts and to search for the optimal structure based on the differential entropy of these variables. The success of our algorithm is demonstrated by applying it to generate models of human motion automatically from unlabeled real image sequences.
Learning Body Pose via Specialized Maps
Rosales, Rómer, Sclaroff, Stan
A nonlinear supervised learning model, the Specialized Mappings Architecture (SMA), is described and applied to the estimation of human body pose from monocular images. The SMA consists of several specialized forward mapping functions and an inverse mapping function.Each specialized function maps certain domains of the input space (image features) onto the output space (body pose parameters). The key algorithmic problems faced are those of learning the specialized domains and mapping functions in an optimal way,as well as performing inference given inputs and knowledge of the inverse function. Solutions to these problems employ the EM algorithm and alternating choices of conditional independence assumptions.Performance of the approach is evaluated with synthetic and real video sequences of human motion. 1 Introduction In everyday life, humans can easily estimate body part locations (body pose) from relatively low-resolution images of the projected 3D world (e.g., when viewing a photograph or a video). However, body pose estimation is a very difficult computer vision problem.
Speech Recognition with Missing Data using Recurrent Neural Nets
In the'missing data' approach to improving the robustness of automatic speech recognition to added noise, an initial process identifies spectraltemporal regionswhich are dominated by the speech source. The remaining regions are considered to be'missing'. In this paper we develop a connectionist approach to the problem of adapting speech recognition to the missing data case, using Recurrent Neural Networks. In contrast to methods based on Hidden Markov Models, RNNs allow us to make use of long-term time constraints and to make the problems of classification with incomplete data and imputing missing values interact. We report encouraging results on an isolated digit recognition task.
Audio-Visual Sound Separation Via Hidden Markov Models
Hershey, John R., Casey, Michael
It is well known that under noisy conditions we can hear speech much more clearly when we read the speaker's lips. This suggests theutility of audiovisual information for the task of speech enhancement. We propose a method to exploit audiovisual cues to enable speech separation under non-stationary noise and with a single microphone. We revise and extend HMM-based speech enhancement techniques, in which signal and noise models are factori allycombined, to incorporate visual lip information and employ novelsignal HMMs in which the dynamics of narrow-band and wide band components are factorial. We avoid the combinatorial explosionin the factorial model by using a simple approximate inference technique to quickly estimate the clean signals in a mixture. We present a preliminary evaluation of this approach using a small-vocabulary audiovisual database, showing promising improvements in machine intelligibility for speech enhanced using audio and visual information.
Learning Spike-Based Correlations and Conditional Probabilities in Silicon
Shon, Aaron P., Hsu, David, Diorio, Chris
We have designed and fabricated a VLSI synapse that can learn a conditional probability or correlation between spike-based inputs and feedback signals. The synapse is low power, compact, provides nonvolatile weight storage, and can perform simultaneous multiplication andadaptation. We can calibrate arrays of synapses to ensure uniform adaptation characteristics. Finally, adaptation in our synapse does not necessarily depend on the signals used for computation. Consequently,our synapse can implement learning rules that correlate past and present synaptic activity. We provide analysis andexperimental chip results demonstrating the operation in learning and calibration mode, and show how to use our synapse to implement various learning rules in silicon.
Learning Discriminative Feature Transforms to Low Dimensions in Low Dimentions
The marriage of Renyi entropy with Parzen density estimation has been shown to be a viable tool in learning discriminative feature transforms. However, it suffers from computational complexity proportional to the square of the number of samples in the training data. This sets a practical limit to using large databases. We suggest immediate divorce of the two methods and remarriage of Renyi entropy with a semi-parametric density estimation method, such as a Gaussian Mixture Models (GMM). This allows allof the computation to take place in the low dimensional target space, and it reduces computational complexity proportional to square of the number of components in the mixtures. Furthermore, a convenient extensionto Hidden Markov Models as commonly used in speech recognition becomes possible.
Risk Sensitive Particle Filters
Thrun, Sebastian, Langford, John, Verma, Vandi
We propose a new particle filter that incorporates a model of costs when generating particles. The approach is motivated by the observation that the costs of accidentally not tracking hypotheses might be significant in some areas of state space, and next to irrelevant in others. By incorporating acost model into particle filtering, states that are more critical to the system performance are more likely to be tracked. Automatic calculation of the cost model is implemented using an MDP value function calculation thatestimates the value of tracking a particular state. Experiments in two mobile robot domains illustrate the appropriateness of the approach.