Collaborating Authors


A lifetime subscription to this intuitive text-to-speech software is on sale for under £30


TL;DR: A lifetime subscription to TexTalky AI Text-to-Speech is on sale for £28.08, saving you 93% on list price. From marketing content and video narration to customer support and tutorials, there are many instances in today's marketplace when a professional human voice is needed. But due to time constraints, lack of proper recording equipment, or simply the fact you hate your voice, you may turn to a text-to-speech software. Sometimes the robotic voices from these apps leave a lot to be desired. TexTalky AI Text-to-Speech aims to convert your text to lifelike human voices in just a few seconds.

Build A Text-To-Speech App Using Client-Side JavaScript


In today's information age, impacts of digital transformations are present across varying industries and sectors.

This scanner pen turns text to speech, translates words, and more


TL;DR: As of Feb. 11, you can slash 37% off this NEWYES Scan Reader Pen 3 Text-to-Speech OCR Multilingual Translator and get it for $124.99 instead of $199. If you are studying a second language, taking lots of notes for work or school, struggle with written text, or just want an easier way to get through the stack of books on your nightstand, there are tools that can help you out. One that's making its mark -- and happens to be on sale -- is the NEWYES Scan Reader Pen 3. The NEWYES Scan opens up new possibilities for learning. You can use it to read and retain information, translate words and phrases, look up words on the spot, capture quotes and transfer them to your computer, or even record audio to review later. This text-to-speech reader pen recognizes 3,000 characters per minute and translates in 0.3 seconds with 98 percent accuracy.

Cross-Lingual Text-to-Speech Using Multi-Task Learning and Speaker Classifier Joint Training Artificial Intelligence

In cross-lingual speech synthesis, the speech in various languages can be synthesized for a monoglot speaker. Normally, only the data of monoglot speakers are available for model training, thus the speaker similarity is relatively low between the synthesized cross-lingual speech and the native language recordings. Based on the multilingual transformer text-to-speech model, this paper studies a multi-task learning framework to improve the cross-lingual speaker similarity. To further improve the speaker similarity, joint training with a speaker classifier is proposed. Here, a scheme similar to parallel scheduled sampling is proposed to train the transformer model efficiently to avoid breaking the parallel training mechanism when introducing joint training. By using multi-task learning and speaker classifier joint training, in subjective and objective evaluations, the cross-lingual speaker similarity can be consistently improved for both the seen and unseen speakers in the training set.

XPeng upgrades EV voice assistant with Microsoft text-to-speech tech – FutureIoT


With a deep understanding of urban mobility, we are finding many more scenarios to leverage AI technology for a high level of driver-machine …

Innovation Award Honorees - CES 2022


OrCam MyEye PRO is a wearable assistive technology device for people who are blind, visually impaired or have reading challenges. It's lightweight, finger-size and magnetically mounts on eyeglass frames. The device instantly reads aloud any printed text (books, menus, signs) and digital screens (computer, smartphone), recognizes faces, and identifies products/bar codes, money notes and colors – all in real time and offline. The interactive Smart Reading feature enables users to tailor their assistive reading experience, and Orientation assists with guidance and identification of objects. Newly released "Hey OrCam" enables control of all device features and settings hands-free, using voice commands.

Guided-TTS:Text-to-Speech with Untranscribed Speech Artificial Intelligence

Most neural text-to-speech (TTS) models require paired data from the desired speaker for high-quality speech synthesis, which limits the usage of large amounts of untranscribed data for training. In this work, we present Guided-TTS, a high-quality TTS model that learns to generate speech from untranscribed speech data. Guided-TTS combines an unconditional diffusion probabilistic model with a separately trained phoneme classifier for text-to-speech. By modeling the unconditional distribution for speech, our model can utilize the untranscribed data for training. For text-to-speech synthesis, we guide the generative process of the unconditional DDPM via phoneme classification to produce mel-spectrograms from the conditional distribution given transcript. We show that Guided-TTS achieves comparable performance with the existing methods without any transcript for LJSpeech. Our results further show that a single speaker-dependent phoneme classifier trained on multispeaker large-scale data can guide unconditional DDPMs for various speakers to perform TTS.

Guided-TTS: Text-to-Speech with Untranscribed Speech - Technology Org


Neural text-to-speech (TTS) models are successfully used to generate high-quality human-like speech. However, most TTS models can be trained if only the transcribed data of the desired speaker is given. That means that long-form untranscribed data, such as podcasts, cannot be used to train existing models. A recent paper on arXiv proposes an unconditional diffusion-based generative model. It is trained on untranscribed data that leverages a phoneme classifier for text-to-speech synthesis.

Disney adds beloved characters as text-to-speech voices in TikTok – and bans them from saying 'lesbian' or 'gay'

The Independent - Tech

A text-to-speech TikTok voice made by Disney that made users sound like Rocket Raccoon does not allow users to'say' words like "gay", "lesbian", or "queer". Numerous posts by users showed the feature failing to say the LGBTQ terms before it was quietly changed to allow the words. Words like "bisexual" and "transgender", were allowed by the feature. Originally, Rocket's voice would skip over the words when written normally but would be pronounced phonetically if a user wrote "qweer", for example. Attempts to make it read text that contained only the seemingly-prohibited words resulted in an error message saying that text-to-speech was not supported by the language chosen.