Connectionist Architectures for Multi-Speaker Phoneme Recognition
II, John B. Hampshire, Waibel, Alex
–Neural Information Processing Systems
We present a number of Time-Delay Neural Network (TDNN) based architectures for multi-speaker phoneme recognition (/b,d,g/ task). We use speech of two females and four males to compare the performance of the various architectures against a baseline recognition rate of 95.9% for a single IDNN on the six-speaker /b,d,g/ task. This series of modular designs leads to a highly modular multi-network architecture capable of performing the six-speaker recognition task at the speaker dependent rate of 98.4%. In addition to its high recognition rate, the so-called "Meta-Pi" architecture learns - without direct supervision - to recognize the speech of one particular male speaker using internal models of other male speakers exclusively.
Neural Information Processing Systems
Dec-31-1990