Towards Robust Overlapping Speech Detection: A Speaker-Aware Progressive Approach Using WavLM
Sun, Zhaokai, Zhang, Li, Wang, Qing, Zhou, Pan, Xie, Lei
–arXiv.org Artificial Intelligence
Overlapping Speech Detection (OSD) aims to identify regions where multiple speakers overlap in a conversation, a critical challenge in multi-party speech processing. This work proposes a speaker-aware progressive OSD model that leverages a progressive training strategy to enhance the correlation between subtasks such as voice activity detection (V AD) and overlap detection. To improve acoustic representation, we explore the effectiveness of state-of-the-art self-supervised learning (SSL) models, including WavLM and wav2vec 2.0, while incorporating a speaker attention module to enrich features with frame-level speaker information. Experimental results show that the proposed method achieves state-of-the-art performance, with an F1 score of 82.76% on the AMI test set, demonstrating its robustness and effectiveness in OSD.
arXiv.org Artificial Intelligence
May-30-2025