tts technology
Optimizing Multilingual Text-To-Speech with Accents & Emotions
Pawar, Pranav, Dwivedi, Akshansh, Boricha, Jenish, Gohil, Himanshu, Dubey, Aditya
State-of-the-art text-to-speech (TTS) systems realize high naturalness in monolingual environments, synthesizing speech with correct multilingual accents (especially for Indic languages) and context-relevant emotions still poses difficulty owing to cultural nuance discrepancies in current frameworks. This paper introduces a new TTS architecture integrating accent along with preserving transliteration with multi-scale emotion modelling, in particularly tuned for Hindi and Indian English accent. Our approach extends the Parler-TTS model by integrating A language-specific phoneme alignment hybrid encoder-decoder architecture, and culture-sensitive emotion embedding layers trained on native speaker corpora, as well as incorporating a dynamic accent code switching with residual vector quantization. Quantitative tests demonstrate 23.7% improvement in accent accuracy (Word Error Rate reduction from 15.4% to 11.8%) and 85.3% emotion recognition accuracy from native listeners, surpassing METTS and VECL-TTS baselines. The novelty of the system is that it can mix code in real time - generating statements such as "Namaste, let's talk about
A review-based study on different Text-to-Speech technologies
Chowdhury, Md. Jalal Uddin, Hussan, Ashab
This research paper presents a comprehensive review-based study on various Text-to-Speech (TTS) technologies. TTS technology is an important aspect of human-computer interaction, enabling machines to convert written text into audible speech. The paper examines the different TTS technologies available, including concatenative TTS, formant synthesis TTS, and statistical parametric TTS. The study focuses on comparing the advantages and limitations of these technologies in terms of their naturalness of voice, the level of complexity of the system, and their suitability for different applications. In addition, the paper explores the latest advancements in TTS technology, including neural TTS and hybrid TTS. The findings of this research will provide valuable insights for researchers, developers, and users who want to understand the different TTS technologies and their suitability for specific applications.
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.97)
- Information Technology > Artificial Intelligence > Machine Learning (0.95)
- Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.70)
less-known-facts-about-ai-voices-and-text-to-speech
Voice artificial intelligence is an emerging technology that uses voice commands to interact with humans. The technology is witnessing tremendous growth and intense research in modern engineering to explore untapped areas. We are well accustomed to hearing AI voices narrating monotone articles and reports. One of the most trending examples of their use by many people is Alexa and Siri-enabled devices. These devices are getting significant recognition, and the market for similar products is growing exceptionally.
How innovations in voice have made it an end-to-end commerce channel
Text-to-speech (TTS) technology isn't exactly new – but the way it's shaping the future certainly is. From smart speakers to voice assistants, TTS is increasingly paramount in day-to-day interactions between brands and end users, leading to enhanced brand experiences and better business outcomes. Up until recently, TTS was confined to a specific use case: voice-enablement of written content to make computers'speak' to those with visual or reading impairments. TTS technology was based on utility and a need to make screen-related content accessible. As such, synthetic speech was traditionally digital-sounding and marred by poor audio quality and speaking style.