SFMS-ALR: Script-First Multilingual Speech Synthesis with Adaptive Locale Resolution
–arXiv.org Artificial Intelligence
Intra - sentence multilingual speech synthesis (code - switching TTS) remains a major challenge due to abrupt language shifts, varied scripts, and mismatched prosody between languages. Conventional TTS systems are typically monolingual and fail to produce natural, intelligible speech in mixed - language contexts. We introduce Script - First Multilingual Synthesis with Adaptive Locale Resolution (SFMS - ALR) an engine - agnostic framework for fluent, real - time code - switched speech generation. SFMS - ALR segments input text by Unicode script, applies adaptive language identification to determine each segment's language and locale, and normalizes prosody using sentiment - aware adjustments to preserve expressive continuity across languages. The algorithm generates a unified SSML representation with appropriate or spans and synthesizes the utterance in a single TTS request. Unlike end - to - end multilingual models, SFMS - ALR requires no retraining and integrates seamlessly with existing voices from Google, Apple, Amazon, and other providers. Comparative analysis with data - driven pipelines such as Unicom and Mask LID demonstrates SFMS - ALR's flexibility, interpretability, and immediate deployability . The framework establishes a modular baseline for high - quality, engine - independent multilingual TTS and outlines evaluation strategies for intelligibility, naturalness, and user preference.
arXiv.org Artificial Intelligence
Oct-30-2025
- Country:
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America > United States (0.04)
- Europe > United Kingdom
- Genre:
- Research Report (0.51)
- Industry:
- Information Technology > Services (0.47)
- Technology: