GraphemeAug: A Systematic Approach to Synthesized Hard Negative Keyword Spotting Examples
Zhang, Harry, Partridge, Kurt, Zhu, Pai, Chen, Neng, Park, Hyun Jin, Agarwal, Dhruuv, Wang, Quan
–arXiv.org Artificial Intelligence
Spoken Keyword Spotting (KWS) is the task of distinguishing between the presence and absence of a keyword in audio. The accuracy of a KWS model hinges on its ability to correctly classify examples close to the keyword and non-keyword boundary. These boundary examples are often scarce in training data, limiting model performance. In this paper, we propose a method to systematically generate adversarial examples close to the decision boundary by making insertion/deletion/substitution edits on the keyword's graphemes. We evaluate this technique on held-out data for a popular keyword and show that the technique improves AUC on a dataset of synthetic hard negatives by 61% while maintaining quality on positives and ambient negative audio data.
arXiv.org Artificial Intelligence
May-27-2025