Goto

Collaborating Authors

 Technology


Markov Processes on Curves for Automatic Speech Recognition

Neural Information Processing Systems

It is widely observed, for example, that fast speech is more prone to recognition errors than slow speech. A related effect, occurring at the phoneme level, is that consonants (l,re more frequently botched than vowels. Generally speaking, consonants have short-lived, non-stationary acoustic signatures; vowels, just the opposite. Thus, at the phoneme level, we can view the increased confusability of consonants as a consequence of locally fast speech.


Maximum-Likelihood Continuity Mapping (MALCOM): An Alternative to HMMs

Neural Information Processing Systems

We describe Maximum-Likelihood Continuity Mapping (MALCOM), an alternative to hidden Markov models (HMMs) for processing sequence data such as speech. While HMMs have a discrete "hidden" space constrained by a fixed finite-automaton architecture, MALCOM has a continuous hidden space-a continuity map-that is constrained only by a smoothness requirement on paths through the space. MALCOM fits into the same probabilistic framework for speech recognition as HMMs, but it represents a more realistic model of the speech production process. To evaluate the extent to which MALCOM captures speech production information, we generated continuous speech continuity maps for three speakers and used the paths through them to predict measured speech articulator data. The median correlation between the MALCOM paths obtained from only the speech acoustics and articulator measurements was 0.77 on an independent test set not used to train MALCOM or the predictor.


Controlling the Complexity of HMM Systems by Regularization

Neural Information Processing Systems

This paper introduces a method for regularization ofHMM systems that avoids parameter overfitting caused by insufficient training data. Regularization is done by augmenting the EM training method by a penalty term that favors simple and smooth HMM systems. The penalty term is constructed as a mixture model of negative exponential distributions that is assumed to generate the state dependent emission probabilities of the HMMs. This new method is the successful transfer of a well known regularization approach in neural networks to the HMM domain and can be interpreted as a generalization of traditional state-tying for HMM systems. The effect of regularization is demonstrated for continuous speech recognition tasks by improving overfitted triphone models and by speaker adaptation with limited training data. 1 Introduction One general problem when constructing statistical pattern recognition systems is to ensure the capability to generalize well, i.e. the system must be able to classify data that is not contained in the training data set.


Coding Time-Varying Signals Using Sparse, Shift-Invariant Representations

Neural Information Processing Systems

A common way to represent a time series is to divide it into shortduration blocks, each of which is then represented by a set of basis functions. A limitation of this approach, however, is that the temporal alignment of the basis functions with the underlying structure in the time series is arbitrary. We present an algorithm for encoding a time series that does not require blocking the data. The algorithm finds an efficient representation by inferring the best temporal positions for functions in a kernel basis. These can have arbitrary temporal extent and are not constrained to be orthogonal.


An Entropic Estimator for Structure Discovery

Neural Information Processing Systems

We introduce a novel framework for simultaneous structure and parameter learning in hidden-variable conditional probability models, based on an en tropic prior and a solution for its maximum a posteriori (MAP) estimator. The MAP estimate minimizes uncertainty in all respects: cross-entropy between model and data; entropy of the model; entropy of the data's descriptive statistics. Iterative estimation extinguishes weakly supported parameters, compressing and sparsifying the model. Trimming operators accelerate this process by removing excess parameters and, unlike most pruning schemes, guarantee an increase in posterior probability. Entropic estimation takes a overcomplete random model and simplifies it, inducing the structure of relations between hidden and observed variables. Applied to hidden Markov models (HMMs), it finds a concise finite-state machine representing the hidden structure of a signal. We entropically model music, handwriting, and video time-series, and show that the resulting models are highly concise, structured, predictive, and interpretable: Surviving states tend to be highly correlated with meaningful partitions of the data, while surviving transitions provide a low-perplexity model of the signal dynamics.


A High Performance k-NN Classifier Using a Binary Correlation Matrix Memory

Neural Information Processing Systems

This paper presents a novel and fast k-NN classifier that is based on a binary CMM (Correlation Matrix Memory) neural network. A robust encoding method is developed to meet CMM input requirements. A hardware implementation of the CMM is described, which gives over 200 times the speed of a current mid-range workstation, and is scaleable to very large problems. When tested on several benchmarks and compared with a simple k-NN method, the CMM classifier gave less than I % lower accuracy and over 4 and 12 times speedup in software and hardware respectively.


Computation of Smooth Optical Flow in a Feedback Connected Analog Network

Neural Information Processing Systems

In 1986, Tanner and Mead [1] implemented an interesting constraint satisfaction circuit for global motion sensing in a VLSI. We report here a new and improved a VLSI implementation that provides smooth optical flow as well as global motion in a two dimensional visual field. The computation of optical flow is an ill-posed problem, which expresses itself as the aperture problem. However, the optical flow can be estimated by the use of regularization methods, in which additional constraints are introduced in terms of a global energy functional that must be minimized. We show how the algorithmic constraints of Hom and Schunck [2] on computing smooth optical flow can be mapped onto the physical constraints of an equivalent electronic network.


An Integrated Vision Sensor for the Computation of Optical Flow Singular Points

Neural Information Processing Systems

A robust, integrative algorithm is presented for computing the position of the focus of expansion or axis of rotation (the singular point) in optical flow fields such as those generated by self-motion. Measurements are shown of a fully parallel CMOS analog VLSI motion sensor array which computes the direction of local motion (sign of optical flow) at each pixel and can directly implement this algorithm. The flow field singular point is computed in real time with a power consumption of less than 2 m W. Computation of the singular point for more general flow fields requires measures of field expansion and rotation, which it is shown can also be computed in real-time hardware, again using only the sign of the optical flow field. These measures, along with the location of the singular point, provide robust real-time self-motion information for the visual guidance of a moving platform such as a robot.


A Neuromorphic Monaural Sound Localizer

Neural Information Processing Systems

We describe the first single microphone sound localization system and its inspiration from theories of human monaural sound localization. Reflections and diffractions caused by the external ear (pinna) allow humans to estimate sound source elevations using only one ear. Our single microphone localization model relies on a specially shaped reflecting structure that serves the role of the pinna. Specially designed analog VLSI circuitry uses echo-time processing to localize the sound. A CMOS integrated circuit has been designed, fabricated, and successfully demonstrated on actual sounds. 1 Introduction The principal cues for human sound localization arise from time and intensity differences between the signals received at the two ears. For low-frequency components of sounds (below 1500Hz for humans), the phase-derived interaural time difference (lTD) can be used to localize the sound source. For these frequencies, the sound wavelength is at least several times larger than the head and the amount of shadowing (which depends on the wavelength of the sound compared with the dimensions of the head) is negligible.


VLSI Implementation of Motion Centroid Localization for Autonomous Navigation

Neural Information Processing Systems

A circuit for fast, compact and low-power focal-plane motion centroid localization is presented. This chip, which uses mixed signal CMOS components to implement photodetection, edge detection, ONset detection and centroid localization, models the retina and superior colliculus. The centroid localization circuit uses time-windowed asynchronously triggered row and column address events and two linear resistive grids to provide the analog coordinates of the motion centroid. This VLSI chip is used to realize fast lightweight autonavigating vehicles. The obstacle avoiding line-following algorithm is discussed.