Mitigating Intra-Speaker Variability in Diarization with Style-Controllable Speech Augmentation

Kim, Miseul, Park, Soo Jin, Byun, Kyungguen, Shin, Hyeon-Kyeong, Moon, Sunkuk, Zhang, Shuhua, Visser, Erik

Sep-19-2025–arXiv.org Artificial Intelligence

This can cause segments from the same speaker to be misclassified as different individuals, for example, when one raises their voice or speaks faster during conversation. To address this, we propose a style-controllable speech generation model that augments speech across diverse styles while preserving the target speaker's identity. The proposed system starts with diarized segments from a conventional diarizer. For each diarized segment, it generates augmented speech samples enriched with phonetic and stylistic diversity. And then, speaker embeddings from both the original and generated audio are blended to enhance the system's robustness in grouping segments with high intrinsic intra-speaker variability.

artificial intelligence, machine learning, speech recognition, (16 more...)

arXiv.org Artificial Intelligence

Sep-19-2025

arXiv.org PDF

Add feedback

Country:
- Asia (0.28)
- North America > United States
  - California (0.14)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Speech > Speech Recognition (0.69)
  - Machine Learning > Neural Networks (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found