Exploring Speaker Diarization with Mixture of Experts

Yang, Gaobin, He, Maokui, Niu, Shutong, Wang, Ruoyu, Chen, Hang, Du, Jun

Jun-18-2025–arXiv.org Artificial Intelligence

--In this paper, we propose a novel neural speaker diarization system using memory-aware multi-speaker embedding with sequence-to-sequence architecture (NSD-MS2S), which integrates a memory-aware multi-speaker embedding module with a sequence-to-sequence architecture. The system leverages a memory module to enhance speaker embeddings and employs a Seq2Seq framework to efficiently map acoustic features to speaker labels. Additionally, we explore the application of mixture of experts in spkeaker diarization, and introduce a Shared and Soft Mixture of Experts (SS-MoE) module, to further mitigate model bias and enhance performance. Incorporating SS-MoE leads to the extended model NSD-MS2S-SSMoE. Experiments on multiple complex acoustic datasets, including CHiME-6, DiPCo, Mixer 6 and DIHARD-III evaluation sets, demonstrate meaningful improvements in robustness and generalization. The proposed methods achieve state-of-the-art results, showcasing their effectiveness in challenging real-world scenarios. PEAKER diarization, which aims to determine the temporal boundaries of individual speakers within an audio stream and assign appropriate speaker identities, addresses the fundamental question of "who spoke when" [1]. It serves as a foundational component in numerous downstream speech-related tasks, including automatic meeting summarization, conversational analysis, and dialogue transcription [2].

artificial intelligence, machine learning, module, (17 more...)

arXiv.org Artificial Intelligence

Jun-18-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.14)

Genre:
- Research Report (1.00)

Industry:
- Media (0.34)
- Leisure & Entertainment (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Speech > Speech Recognition (0.69)
  - Machine Learning
    - Neural Networks > Deep Learning (1.00)
    - Statistical Learning (0.94)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found