Goto

Collaborating Authors

 unwritten language


PolyVoice: Language Models for Speech to Speech Translation

Dong, Qianqian, Huang, Zhiying, Tian, Qiao, Xu, Chen, Ko, Tom, Zhao, Yunlong, Feng, Siyuan, Li, Tang, Wang, Kexin, Cheng, Xuxin, Yue, Fengpeng, Bai, Ye, Chen, Xi, Lu, Lu, Ma, Zejun, Wang, Yuping, Wang, Mingxuan, Wang, Yuxuan

arXiv.org Artificial Intelligence

We propose PolyVoice, a language model-based framework for speech-to-speech translation (S2ST) system. Our framework consists of two language models: a translation language model and a speech synthesis language model. We use discretized speech units, which are generated in a fully unsupervised way, and thus our framework can be used for unwritten languages. For the speech synthesis part, we adopt the existing VALL-E X approach and build a unit-based audio language model. This grants our framework the ability to preserve the voice characteristics and the speaking style of the original speech. We examine our system on Chinese $\rightarrow$ English and English $\rightarrow$ Spanish pairs. Experimental results show that our system can generate speech with high translation quality and audio quality. Speech samples are available at https://speechtranslation.github.io/polyvoice.


The Morning After: The Silent Hill universe is expanding, with help from J.J. Abrams

Engadget

Konami today dropped a ton of news about the future of its iconic horror franchise. Aside from confirming that remake of Silent Hill 2, the studio revealed three new games. Townfall comes from Annapurna Interactive and No Code, a Glasgow studio known for strong narrative titles like Observation and Stories Untold. The short teaser for Townfall looks to be the most traditional Silent Hill game of the trio. Ascension, due out in 2023, is the least game-like installment, but it will feature the influence of J.J. Abrams.


Meta AI announces first AI-powered speech translation system for an unwritten language

#artificialintelligence

Did you miss a session from MetaBeat 2022? Head over to the on-demand library for all of our featured sessions here. Artificial speech translation is a rapidly emerging artificial intelligence (AI) technology. Initially created to aid communication among people who speak different languages, this speech-to-speech translation technology (S2ST) has found its way into several domains. For example, global tech conglomerates are now using S2ST for directly translating shared documents and audio conversations in the metaverse.


Meta's AI translator can interpret unwritten languages

Engadget

Nearly half of the world's roughly 7,000 known languages four in ten of them exist without an accompanying written component. These unwritten languages pose a unique problem for modern machine learning translation systems, as they typically need to convert verbal speech to written words before translating to the new language and reverting the text back to speech, but one that Meta has reportedly addressed with its latest open-source language AI advancement. As part of Meta's Universal Speech Translator (UST) program which is working to develop real-time speech-to-speech translation so that Metaverse denizens can more easily interact (read: sexually harass one another). As part of this project, Meta researchers looked at Hokkien, an unwritten language spoken throughout Asia's diaspora and one of Taiwan's official languages. Machine learning translation systems typically require extensive labelable examples of the language, both written and spoken, to train on -- precisely what unwritten languages like Hokkien don't have.