Goto

Collaborating Authors

 Undirected Networks


Learning Discriminative Feature Transforms to Low Dimensions in Low Dimentions

Neural Information Processing Systems

The marriage of Renyi entropy with Parzen density estimation has been shown to be a viable tool in learning discriminative feature transforms. However, it suffers from computational complexity proportional to the square of the number of samples in the training data. This sets a practical limit to using large databases. We suggest immediate divorce of the two methods and remarriage of Renyi entropy with a semi-parametric density estimation method, such as a Gaussian Mixture Models (GMM). This allows all of the computation to take place in the low dimensional target space, and it reduces computational complexity proportional to square of the number of components in the mixtures. Furthermore, a convenient extension to Hidden Markov Models as commonly used in speech recognition becomes possible.


Risk Sensitive Particle Filters

Neural Information Processing Systems

We propose a new particle filter that incorporates a model of costs when generating particles. The approach is motivated by the observation that the costs of accidentally not tracking hypotheses might be significant in some areas of state space, and next to irrelevant in others. By incorporating a cost model into particle filtering, states that are more critical to the system performance are more likely to be tracked. Automatic calculation of the cost model is implemented using an MDP value function calculation that estimates the value of tracking a particular state. Experiments in two mobile robot domains illustrate the appropriateness of the approach.


Dynamic Time-Alignment Kernel in Support Vector Machine

Neural Information Processing Systems

A new class of Support Vector Machine (SVM) that is applicable to sequential-pattern recognition such as speech recognition is developed by incorporating an idea of nonlinear time alignment into the kernel function. Since the time-alignment operation of sequential pattern is embedded in the new kernel function, standard SVM training and classification algorithms can be employed without further modifications. The proposed SVM (DTAK-SVM) is evaluated in speaker-dependent speech recognition experiments of hand-segmented phoneme recognition. Preliminary experimental results show comparable recognition performance with hidden Markov models (HMMs).


Linear-time inference in Hierarchical HMMs

Neural Information Processing Systems

The hierarchical hidden Markov model (HHMM) is a generalization of the hidden Markov model (HMM) that models sequences with structure at many length/time scales [FST98].


Fast, Large-Scale Transformation-Invariant Clustering

Neural Information Processing Systems

In previous work on "transformed mixtures of Gaussians" and "transformed hidden Markov models", we showed how the EM algorithm in a discrete latent variable model can be used to jointly normalize data (e.g., center images, pitch-normalize spectrograms) and learn a mixture model of the normalized data. The only input to the algorithm is the data, a list of possible transformations, and the number of clusters to find. The main criticism of this work was that the exhaustive computation of the posterior probabilities over transformations would make scaling up to large feature vectors and large sets of transformations intractable. Here, we describe how a tremendous speedup is acheived through the use of a variational technique for decoupling transformations, and a fast Fourier transform method for computing posterior probabilities.


Convolution Kernels for Natural Language

Neural Information Processing Systems

We describe the application of kernel methods to Natural Language Processing (NLP) problems. In many NLP tasks the objects being modeled are strings, trees, graphs or other discrete structures which require some mechanism to convert them into feature vectors. We describe kernels for various natural language structures, allowing rich, high dimensional representations of these structures. We show how a kernel over trees can be applied to parsing using the voted perceptron algorithm, and we give experimental results on the ATIS corpus of parse trees.


The Infinite Hidden Markov Model

Neural Information Processing Systems

We show that it is possible to extend hidden Markov models to have a countably infinite number of hidden states. By using the theory of Dirichlet processes we can implicitly integrate out the infinitely many transition parameters, leaving only three hyperparameters which can be learned from data. These three hyperparameters define a hierarchical Dirichlet process capable of capturing a rich set of transition dynamics. The three hyperparameters control the time scale of the dynamics, the sparsity of the underlying state-transition matrix, and the expected number of distinct hidden states in a finite sequence. In this framework it is also natural to allow the alphabet of emitted symbols to be infinite-- consider, for example, symbols being possible words appearing in English text.


ACh, Uncertainty, and Cortical Inference

Neural Information Processing Systems

Acetylcholine (ACh) has been implicated in a wide variety of tasks involving attentional processes and plasticity. Following extensive animal studies, it has previously been suggested that ACh reports on uncertainty and controls hippocampal, cortical and cortico-amygdalar plasticity. We extend this view and consider its effects on cortical representational inference, arguing that ACh controls the balance between bottom-up inference, influenced by input stimuli, and top-down inference, influenced by contextual information. We illustrate our proposal using a hierarchical hidden Markov model.


Modeling Temporal Structure in Classical Conditioning

Neural Information Processing Systems

The Temporal Coding Hypothesis of Miller and colleagues [7] suggests that animals integrate related temporal patterns of stimuli into single memory representations. We formalize this concept using quasi-Bayes estimation to update the parameters of a constrained hidden Markov model. This approach allows us to account for some surprising temporal effects in the second order conditioning experiments of Miller et al. [1, 2, 3], which other models are unable to explain.


Efficient Resources Allocation for Markov Decision Processes

Neural Information Processing Systems

Assume that we model a complex decision-making problem under uncertainty by a finite MDP. Because of the limited resources used, the parameters of the MDP (transition probabilities and rewards) are uncertain: we assume that we only know a belief state over their possible values. IT we select the most probable values of the parameters, we can build a MDP and solve it to deduce the corresponding optimal policy. However, because of the uncertainty over the true parameters, this policy may not be the one that maximizes the expected cumulative rewards of the true (but partially unknown) decision-making problem. We can nevertheless use sampling techniques to estimate the expected loss of using this policy.