Levin, Esther


Planar Hidden Markov Modeling: From Speech to Optical Character Recognition

Neural Information Processing Systems

We propose in this paper a statistical model (planar hidden Markov model - PHMM) describing statistical properties of images. The model generalizes the single-dimensional HMM, used for speech processing, to the planar case. For this model to be useful an efficient segmentation algorithm, similar to the Viterbi algorithm for HMM, must exist We present conditions in terms of the PHMM parameters that are sufficient to guarantee that the planar segmentation problem can be solved in polynomial time, and describe an algorithm for that. This algorithm aligns optimally the image with the model, and therefore is insensitive to elastic distortions of images. Using this algorithm a joint optima1 segmentation and recognition of the image can be performed, thus overcoming the weakness of traditional OCR systems where segmentation is performed independently before the recognition leading to unrecoverable recognition errors. Tbe PHMM approach was evaluated using a set of isolated band-written digits. An overall digit recognition accuracy of 95% was acbieved. An analysis of the results showed that even in the simple case of recognition of isolated characters, the elimination of elastic distortions enhances the performance Significantly. We expect that the advantage of this approach will be even more significant for tasks such as connected writing recognition/spotting, for whicb there is no known high accuracy method of recognition.


Planar Hidden Markov Modeling: From Speech to Optical Character Recognition

Neural Information Processing Systems

We propose in this paper a statistical model (planar hidden Markov model - PHMM) describing statistical properties of images. The model generalizes the single-dimensional HMM, used for speech processing, to the planar case. For this model to be useful an efficient segmentation algorithm, similar to the Viterbi algorithm for HMM, must exist We present conditions in terms of the PHMM parameters that are sufficient to guarantee that the planar segmentation problem can be solved in polynomial time, and describe an algorithm for that. This algorithm aligns optimally the image with the model, and therefore is insensitive to elastic distortions of images. Using this algorithm a joint optima1 segmentation and recognition of the image can be performed, thus overcoming the weakness of traditional OCR systems where segmentation is performed independently before the recognition leading to unrecoverable recognition errors. Tbe PHMM approach was evaluated using a set of isolated band-written digits. An overall digit recognition accuracy of 95% was acbieved. An analysis of the results showed that even in the simple case of recognition of isolated characters, the elimination of elastic distortions enhances the performance Significantly. We expect that the advantage of this approach will be even more significant for tasks such as connected writing recognition/spotting, for whicb there is no known high accuracy method of recognition.


Time-Warping Network: A Hybrid Framework for Speech Recognition

Neural Information Processing Systems

Such systems attempt to combine the best features of both models: the temporal structure of HMMs and the discriminative power of neural networks. In this work we define a time-warping (1W) neuron that extends the operation of the fonnal neuron of a back-propagation network by warping the input pattern to match it optimally to its weights. We show that a single-layer network of TW neurons is equivalent to a Gaussian density HMMbased recognitionsystem.


Time-Warping Network: A Hybrid Framework for Speech Recognition

Neural Information Processing Systems

Such systems attempt to combine the best features of both models: the temporal structure of HMMs and the discriminative power of neural networks. In this work we define a time-warping (1W) neuron that extends the operation of the fonnal neuron of a back-propagation network by warping the input pattern to match it optimally to its weights. We show that a single-layer network of TW neurons is equivalent to a Gaussian density HMMbased recognition system.


Modeling Time Varying Systems Using Hidden Control Neural Architecture

Neural Information Processing Systems

This paper introduces a generalization of the layered neural network that can implement a time-varying nonlinear mapping between its observable input and output. The variation of the network's mapping is due to an additional, hidden control input, while the network parameters remain unchanged. We proposed an algorithm for finding the network parameters and the hidden control sequence from a training set of examples of observable input and output. This algorithm implements an approximate maximum likelihood estimation of parameters of an equivalent statistical model, when only the dominant control sequence is taken into account. The conceptual difference between the proposed model and the HMM is that in the HMM approach, the observable data in each of the states is modeled as though it was produced by a memoryless source, and a parametric description of this source is obtained during training, while in the proposed model the observations in each state are produced by a nonlinear dynamical system driven by noise, and both the parametric form of the dynamics and the noise are estimated. The perfonnance of the model was illustrated for the tasks of nonlinear time-varying system modeling and continuously spoken digit recognition. The reported results show the potential of this model for providing high performance speech recognition capability. Acknowledgment Specialthanks are due to N. Merhav for numerous comments and helpful discussions.


Modeling Time Varying Systems Using Hidden Control Neural Architecture

Neural Information Processing Systems

This paper introduces a generalization of the layered neural network that can implement a time-varying nonlinear mapping between its observable input and output. The variation of the network's mapping is due to an additional, hidden control input, while the network parameters remain unchanged. We proposed an algorithm for finding the network parameters and the hidden control sequence from a training set of examples of observable input and output. This algorithm implements an approximate maximum likelihood estimation of parameters of an equivalent statistical model, when only the dominant control sequence is taken into account. The conceptual difference between the proposed model and the HMM is that in the HMM approach, the observable data in each of the states is modeled as though it was produced by a memoryless source, and a parametric description of this source is obtained during training, while in the proposed model the observations in each state are produced by a nonlinear dynamical system driven by noise, and both the parametric form of the dynamics and the noise are estimated. The perfonnance of the model was illustrated for the tasks of nonlinear time-varying system modeling and continuously spoken digit recognition. The reported results show the potential of this model for providing high performance speech recognition capability. Acknowledgment Special thanks are due to N. Merhav for numerous comments and helpful discussions.