Undirected Networks
Learning Discriminative Feature Transforms to Low Dimensions in Low Dimentions
The marriage of Renyi entropy with Parzen density estimation has been shown to be a viable tool in learning discriminative feature transforms. However, it suffers from computational complexity proportional to the square of the number of samples in the training data. This sets a practical limit to using large databases. We suggest immediate divorce of the two methods and remarriage of Renyi entropy with a semi-parametric density estimation method, such as a Gaussian Mixture Models (GMM). This al- lows all of the computation to take place in the low dimensional target space, and it reduces computational complexity proportional to square of the number of components in the mixtures. Furthermore, a conve- nient extension to Hidden Markov Models as commonly used in speech recognition becomes possible.
Efficient Resources Allocation for Markov Decision Processes
It is desirable that a complex decision-making problem in an uncer(cid:173) tain world be adequately modeled by a Markov Decision Process (MDP) whose structural representation is adaptively designed by a parsimonious resources allocation process. Resources include time and cost of exploration, amount of memory and computational time allowed for the policy or value function representation. Concerned about making the best use of the available resources, we address the problem of efficiently estimating where adding extra resources is highly needed in order to improve the expected performance of the resulting policy. Possible application in reinforcement learning (RL), when real-world exploration is highly costly, concerns the de(cid:173) tection of those areas of the state-space that need primarily to be explored in order to improve the policy. Another application con(cid:173) cerns approximation of continuous state-space stochastic control problems using adaptive discretization techniques for which highly efficient grid points allocation is mandatory to survive high dimen(cid:173) sionality.
Audio-Visual Sound Separation Via Hidden Markov Models
It is well known that under noisy conditions we can hear speech much more clearly when we read the speaker's lips. We propose a method to exploit audio-visual cues to enable speech separation under non-stationary noise and with a single microphone. We revise and extend HMM-based speech enhancement techniques, in which signal and noise models are fac(cid:173) tori ally combined, to incorporate visual lip information and em(cid:173) ploy novel signal HMMs in which the dynamics of narrow-band and wide band components are factorial. We avoid the combina(cid:173) torial explosion in the factorial model by using a simple approxi(cid:173) mate inference technique to quickly estimate the clean signals in a mixture. We present a preliminary evaluation of this approach using a small-vocabulary audio-visual database, showing promising improvements in machine intelligibility for speech enhanced using audio and visual information.
Modeling Temporal Structure in Classical Conditioning
The Temporal Coding Hypothesis of Miller and colleagues [7] sug(cid:173) gests that animals integrate related temporal patterns of stimuli into single memory representations. We formalize this concept using quasi-Bayes estimation to update the parameters of a con(cid:173) strained hidden Markov model. This approach allows us to account for some surprising temporal effects in the second order condition(cid:173) ing experiments of Miller et al. [1, 2, 3], which other models are unable to explain.
Sequential Noise Compensation by Sequential Monte Carlo Method
We present a sequential Monte Carlo method applied to additive noise compensation for robust speech recognition in time-varying noise. The method generates a set of samples according to the prior distribution given by clean speech models and noise prior evolved from previous estimation. An explicit model representing noise ef- fects on speech features is used, so that an extended Kalman filter is constructed for each sample, generating the updated continuous state estimate as the estimation of the noise parameter, and predic- tion likelihood for weighting each sample. Minimum mean square error (MMSE) inference of the time-varying noise parameter is car- ried out over these samples by fusion the estimation of samples ac- cording to their weights. A residual resampling selection step and a Metropolis-Hastings smoothing step are used to improve calcula- tion e#ciency. Experiments were conducted on speech recognition in simulated non-stationary noises, where noise power changed ar- tificially, and highly non-stationary Machinegun noise.
Linear-time inference in Hierarchical HMMs
The hierarchical hidden Markov model (HHMM) is a generalization of the hidden Markov model (HMM) that models sequences with structure at many length/time scales [FST98]. Unfortunately, the original infer- is ence algorithm is rather complicated, and takes the length of the sequence, making it impractical for many domains. In this paper, we show how HHMMs are a special kind of dynamic Bayesian network (DBN), and thereby derive a much simpler inference algorithm, which only takes time. Furthermore, by drawing the connection between HHMMs and DBNs, we enable the application of many stan- dard approximation techniques to further speed up inference.
Dynamic Time-Alignment Kernel in Support Vector Machine
A new class of Support Vector Machine (SVM) that is applica- ble to sequential-pattern recognition such as speech recognition is developed by incorporating an idea of non-linear time alignment into the kernel function. Since the time-alignment operation of sequential pattern is embedded in the new kernel function, stan- dard SVM training and classification algorithms can be employed without further modifications. The proposed SVM (DTAK-SVM) is evaluated in speaker-dependent speech recognition experiments of hand-segmented phoneme recognition. Preliminary experimen- tal results show comparable recognition performance with hidden Markov models (HMMs). 1 Introduction Support Vector Machine (SVM) [1] is one of the latest and most successful statistical pattern classifier that utilizes a kernel technique [2, 3].
Relative Density Nets: A New Way to Combine Backpropagation with HMM's
Logistic units in the first hidden layer of a feedforward neural net(cid:173) work compute the relative probability of a data point under two Gaussians. This leads us to consider substituting other density models. We present an architecture for performing discriminative learning of Hidden Markov Models using a network of many small HMM's. Experiments on speech data show it to be superior to the standard method of discriminatively training HMM's.
Fast, Large-Scale Transformation-Invariant Clustering
In previous work on transformed mixtures of Gaussians'' andtransformed hidden Markov models'', we showed how the EM al- gorithm in a discrete latent variable model can be used to jointly normalize data (e.g., center images, pitch-normalize spectrograms) and learn a mixture model of the normalized data. The only input to the algorithm is the data, a list of possible transformations, and the number of clusters to find. The main criticism of this work was that the exhaustive computation of the posterior probabili- ties over transformations would make scaling up to large feature vectors and large sets of transformations intractable. Here, we de- scribe how a tremendous speed-up is acheived through the use of a variational technique for decoupling transformations, and a fast Fourier transform method for computing posterior probabilities. We give results on learning a 4-component mixture model from a video sequence with frames of size 320240.
Speech Recognition with Missing Data using Recurrent Neural Nets
In the missing data' approach to improving the robustness of automatic speech recognition to added noise, an initial process identifies spectral- temporal regions which are dominated by the speech source. The remaining regions are considered to bemissing'. In this paper we develop a connectionist approach to the problem of adapting speech recognition to the missing data case, using Recurrent Neural Networks. In contrast to methods based on Hidden Markov Models, RNNs allow us to make use of long-term time constraints and to make the problems of classification with incomplete data and imputing missing values interact. We report encouraging results on an isolated digit recognition task. 1. Introduction Automatic Speech Recognition systems perform reasonably well in controlled and matched training and recognition conditions.