Self-Learning for Personalized Keyword Spotting on Ultra-Low-Power Audio Sensors
Rusci, Manuele, Paci, Francesco, Fariselli, Marco, Flamand, Eric, Tuytelaars, Tinne
–arXiv.org Artificial Intelligence
This paper proposes a self-learning framework to incrementally train (fine-tune) a personalized Keyword Spotting (KWS) model after the deployment on ultra-low power smart audio sensors. We address the fundamental problem of the absence of labeled training data by assigning pseudo-labels to the new recorded audio frames based on a similarity score with respect to few user recordings. By experimenting with multiple KWS models with a number of parameters up to 0.5M on two public datasets, we show an accuracy improvement of up to +19.2% and +16.0% vs. the initial models pretrained on a large set of generic keywords. The labeling task is demonstrated on a sensor system composed of a low-power microphone and an energy-efficient Microcontroller (MCU). By efficiently exploiting the heterogeneous processing engines of the MCU, the always-on labeling task runs in real-time with an average power cost of up to 8.2 mW. On the same platform, we estimate an energy cost for on-device training 10x lower than the labeling energy if sampling a new utterance every 5 s or 16.4 s with a DS-CNN-S or a DS-CNN-M model. Our empirical result paves the way to self-adaptive personalized KWS sensors at the extreme edge.
arXiv.org Artificial Intelligence
Aug-22-2024
- Country:
- Asia > Russia (0.04)
- North America > United States
- New York > New York County > New York City (0.04)
- Europe
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Education (0.46)
- Technology: