A Data-Centric Framework for Addressing Phonetic and Prosodic Challenges in Russian Speech Generative Models
Borodin, Kirill, Vasiliev, Nikita, Kudryavtsev, Vasiliy, Maslov, Maxim, Gorodnichev, Mikhail, Rogov, Oleg, Mkrtchian, Grach
–arXiv.org Artificial Intelligence
This work is still in progress Russian speech synthesis presents distinctive challenges, including vowel reduction, consonant devoicing, variable stress patterns, homograph ambiguity, and unnatural intonation. This paper introduces Balalaika, a novel dataset comprising more than 2,000 hours of studio-quality Russian speech with comprehensive textual annotations, including punctuation and stress markings. Experimental results show that models trained on Balalaika significantly outperform those trained on existing datasets in both speech synthesis and enhancement tasks.
arXiv.org Artificial Intelligence
Jul-21-2025
- Country:
- Asia
- Japan > Honshū
- Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.14)
- Japan > Honshū
- Europe
- North America
- Canada > Quebec
- Montreal (0.04)
- United States > North Dakota
- Grand Forks County > Grand Forks (0.14)
- Canada > Quebec
- Asia
- Genre:
- Research Report > New Finding (0.89)
- Industry:
- Law (0.46)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks (0.67)
- Natural Language (1.00)
- Speech > Speech Recognition (1.00)
- Information Technology > Artificial Intelligence