RecNorm: Simultaneous Normalisation and Classification applied to Speech Recognition

Bridle, John S., Cox, Stephen J.

Neural Information Processing Systems 

A particular form of neural network is described, which has terminals for acoustic patterns, class labels and speaker parameters. A method of training this network to "tune in" the speaker parameters to a particular speaker is outlined, based on a trick for converting a supervised network to an unsupervised mode. We describe experiments using this approach in isolated word recognition based on whole-word hidden Markov models. The results indicate an improvement over speaker-independent performance and, for unlabelled data, a performance close to that achieved on labelled data. 1 INTRODUCTION We are concerned to emulate some aspects of perception. In particular, the way that a stimulus which is ambiguous, perhaps because of unknown lighting conditions, can become unambiguous in the context of other such stimuli: the fact that they are subject to tbe same unknown conditions gives our perceptual apparatus enough constraints to solve tbe problem. Individual words are often ambiguous even to human listeners. For instance a Cockney might say the word "ace" to sound the same as a Standard English speaker's "ice". Similarly with "room" and "rum", or "work" and "walk" ill other pairs of British English accents. If we heard one of these ambiguous pronunciations, knowing nothing else about the speaker we could not tell which word had been said.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found