Hierarchical Mixtures of Experts Methodology Applied to Continuous Speech Recognition

Zhao, Ying, Schwartz, Richard M., Sroka, Jason J., Makhoul, John

Neural Information Processing Systems 

In this paper, we incorporate the Hierarchical Mixtures of Experts (HME) method of probability estimation, developed by Jordan [1], into an HMMbased continuousspeech recognition system. The resulting system can be thought of as a continuous-density HMM system, but instead of using gaussian mixtures, the HME system employs a large set of hierarchically organized but relatively small neural networks to perform the probability density estimation. The hierarchical structure is reminiscent of a decision tree except for two important differences: each "expert" or neural net performs a "soft" decision rather than a hard decision, and, unlike ordinary decision trees, the parameters of all the neural nets in the HME are automatically trainable using the EM algorithm. We report results on the ARPA 5,OOO-word and 4O,OOO-word Wall Street Journal corpus using HME models. 1 Introduction Recent research has shown that a continuous-density HMM (CD-HMM) system can outperform amore constrained tied-mixture HMM system for large-vocabulary continuous speech recognition (CSR) when a large amount of training data is available [2]. In other work, the utility of decision trees has been demonstrated in classification problems by using the "divide and conquer" paradigm effectively, where a problem is divided into a hierarchical set of simpler problems. We present here a new CD-HMM system which **MIT, Cambridge MA 02139 860 YingZhao, Richard Schwartz, Jason Sroka, John Makhoul has similar properties and possesses the same advantages as decision trees, but has the additional important advantage of having automatically trainable "soft" decision boundaries. 2 Hierarchical Mixtures of Experts The method of Hierarchical Mixtures of Experts (HME) developed recently by Jordan [1] breaks a large scale task into many small ones by partitioning the input space into a nested set of regions, then building a simple but specific model (local expert) in each region.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found