Using Voice Transformations to Create Additional Training Talkers for Word Spotting
–Neural Information Processing Systems
Speech recognizers provide good performance for most users but the error rate often increases dramatically for a small percentage of talkers who are "different" from those talkers used for training. One expensive solution to this problem is to gather more training data in an attempt to sample these outlier users. A second solution, explored in this paper, is to artificially enlarge the number of training talkers by transforming the speech of existing training talkers. This approach is similar to enlarging the training set for OCR digit recognition by warping the training digit images, but is more difficult because continuous speech has a much larger number of dimensions (e.g. We explored the use of simple linear spectral warping to enlarge a 48-talker training data base used for word spotting.
Neural Information Processing Systems
Apr-6-2023, 18:42:23 GMT
- Technology: