Self-Learning for Personalized Keyword Spotting on Ultra-Low-Power Audio Sensors

Rusci, Manuele, Paci, Francesco, Fariselli, Marco, Flamand, Eric, Tuytelaars, Tinne

Aug-22-2024–arXiv.org Artificial Intelligence

This paper proposes a self-learning framework to incrementally train (fine-tune) a personalized Keyword Spotting (KWS) model after the deployment on ultra-low power smart audio sensors. We address the fundamental problem of the absence of labeled training data by assigning pseudo-labels to the new recorded audio frames based on a similarity score with respect to few user recordings. By experimenting with multiple KWS models with a number of parameters up to 0.5M on two public datasets, we show an accuracy improvement of up to +19.2% and +16.0% vs. the initial models pretrained on a large set of generic keywords. The labeling task is demonstrated on a sensor system composed of a low-power microphone and an energy-efficient Microcontroller (MCU). By efficiently exploiting the heterogeneous processing engines of the MCU, the always-on labeling task runs in real-time with an average power cost of up to 8.2 mW. On the same platform, we estimate an energy cost for on-device training 10x lower than the labeling energy if sampling a new utterance every 5 s or 16.4 s with a DS-CNN-S or a DS-CNN-M model. Our empirical result paves the way to self-adaptive personalized KWS sensors at the extreme edge.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

Aug-22-2024

arXiv.org PDF

Add feedback

Country:
- Asia > Russia (0.04)
- North America > United States
  - New York > New York County > New York City (0.04)
- Europe
  - Russia > Northwestern Federal District
    - Leningrad Oblast > Saint Petersburg (0.04)
  - France > Auvergne-Rhône-Alpes
    - Isère > Grenoble (0.04)
  - Belgium > Flanders
    - Flemish Brabant > Leuven (0.04)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Education (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning
    - Neural Networks > Deep Learning (0.93)
    - Statistical Learning (0.93)