End to end Hindi to English speech conversion using Bark, mBART and a finetuned XLSR Wav2Vec2
Tathe, Aniket, Kamble, Anand, Kumbharkar, Suyash, Bhandare, Atharva, Mitra, Anirban C.
–arXiv.org Artificial Intelligence
Speech has long been a barrier to effective communication and connection, persisting as a challenge in our increasingly interconnected world. This research paper introduces a transformative solution to this persistent obstacle - an end-to-end speech conversion framework tailored for Hindi-to-English translation, culminating in the synthesis of English audio. By integrating cutting-edge technologies such as XLSR Wav2Vec2 for automatic speech recognition (ASR), mBART for neural machine translation (NMT), and a Text-to-Speech (TTS) synthesis component, this framework offers a unified and seamless approach to cross-lingual communication. We delve into the intricate details of each component, elucidating their individual contributions and exploring the synergies that enable a fluid transition from spoken Hindi to synthesized English audio.
arXiv.org Artificial Intelligence
Jan-10-2024
- Country:
- Asia > India
- Maharashtra > Pune (0.05)
- Europe > Germany (0.04)
- North America > United States
- Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > India
- Genre:
- Overview > Innovation (0.35)
- Research Report > Promising Solution (0.34)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning (1.00)
- Natural Language > Machine Translation (1.00)
- Speech > Speech Recognition (1.00)
- Information Technology > Artificial Intelligence