Personalizing Keyword Spotting with Speaker Information

Labrador, Beltrán, Zhu, Pai, Zhao, Guanlong, Scarpati, Angelo Scorza, Wang, Quan, Lozano-Diez, Alicia, Park, Alex, Moreno, Ignacio López

Nov-6-2023–arXiv.org Artificial Intelligence

Keyword spotting systems often struggle to generalize to a diverse population with various accents and age groups. To address this challenge, we propose a novel approach that integrates speaker information into keyword spotting using Feature-wise Linear Modulation (FiLM), a recent method for learning from multiple sources of information. We explore both Text-Dependent and Text-Independent speaker recognition systems to extract speaker information, and we experiment on extracting this information from both the input audio and pre-enrolled user audio. We evaluate our systems on a diverse dataset and achieve a substantial improvement in keyword detection accuracy, particularly among underrepresented speaker groups. Moreover, our proposed approach only requires a small 1% increase in the number of parameters, with a minimum impact on latency and computational cost, which makes it a practical solution for real-world applications.

artificial intelligence, keyword, machine learning, (12 more...)

arXiv.org Artificial Intelligence

Nov-6-2023

arXiv.org PDF

Add feedback

Country:
- Europe (0.28)
- North America > United States (0.28)

Genre:
- Research Report > Promising Solution (0.34)

Industry:
- Media (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.47)
  - Speech (0.96)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found