SAGE-LD: Towards Scalable and Generalizable End-to-End Language Diarization via Simulated Data Augmentation

Lee, Sangmin, Choi, Woongjib, Kim, Jihyun, Kang, Hong-Goo

Oct-2-2025–arXiv.org Artificial Intelligence

ABSTRACT In this paper, we present a neural spoken language di-arization model that supports an unconstrained span of languages within a single framework. Our approach integrates a learnable query-based architecture grounded in multilingual awareness, with large-scale pretraining on simulated code-switching data. By jointly leveraging these two components, our method overcomes the limitations of conventional approaches in data scarcity and architecture optimization, and generalizes effectively to real-world multilingual settings across diverse environments. Experimental results demonstrate that our approach achieves state-of-the-art performance on several language diarization benchmarks, with a relative performance improvement of 23% to 52% over previous methods. We believe that this work not only advances research in language diarization but also establishes a founda-tional framework for code-switching speech technologies.

artificial intelligence, machine learning, utterance, (16 more...)

arXiv.org Artificial Intelligence

Oct-2-2025

arXiv.org PDF

Add feedback

Country:
- Asia (0.46)

Genre:
- Research Report > New Finding (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Speech (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found