synthetic voice
MND left her without a voice. Eight seconds of scratchy audio gave it back to her
MND left her without a voice. After such a long time, I couldn't really remember my voice, Sarah Ezekiel tells BBC Access All. When I first heard it again, I felt like crying. The onset of motor neurone disease (MND) left Sarah without a voice and the use of her hands at the age of 34. It was within months of her becoming a mum for the second time.
- Europe > United Kingdom (0.97)
- Asia > Middle East (0.29)
- South America (0.15)
- North America > Central America (0.15)
- Government (0.96)
- Health & Medicine > Therapeutic Area > Rheumatology (0.35)
- Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (0.35)
- (2 more...)
Will AI shape the way we speak? The emerging sociolinguistic influence of synthetic voices
Székely, Éva, Miniota, Jūra, Míša, null, Hejná, null
The growing prevalence of conversational voice interfaces, powered by developments in both speech and language technologies, raises important questions about their influence on human communication. While written communication can signal identity through lexical and stylistic choices, voice-based interactions inherently amplify socioindexical elements - such as accent, intonation, and speech style - which more prominently convey social identity and group affiliation. There is evidence that even passive media such as television is likely to influence the audience's linguistic patterns. Unlike passive media, conversational AI is interactive, creating a more immersive and reciprocal dynamic that holds a greater potential to impact how individuals speak in everyday interactions. Such heightened influence can be expected to arise from phenomena such as acoustic-prosodic entrainment and linguistic accommodation, which occur naturally during interaction and enable users to adapt their speech patterns in response to the system. While this phenomenon is still emerging, its potential societal impact could provide organisations, movements, and brands with a subtle yet powerful avenue for shaping and controlling public perception and social identity. We argue that the socioindexical influence of AI-generated speech warrants attention and should become a focus of interdisciplinary research, leveraging new and existing methodologies and technologies to better understand its implications.
- North America > United States (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Sweden (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
Voice Cloning for Dysarthric Speech Synthesis: Addressing Data Scarcity in Speech-Language Pathology
Moell, Birger, Aronsson, Fredrik Sand
This study explores voice cloning to generate synthetic speech replicating the unique patterns of individuals with dysarthria. Using the TORGO dataset, we address data scarcity and privacy challenges in speech-language pathology. Our contributions include demonstrating that voice cloning preserves dysarthric speech characteristics, analyzing differences between real and synthetic data, and discussing implications for diagnostics, rehabilitation, and communication. We cloned voices from dysarthric and control speakers using a commercial platform, ensuring gender-matched synthetic voices. A licensed speech-language pathologist (SLP) evaluated a subset for dysarthria, speaker gender, and synthetic indicators. The SLP correctly identified dysarthria in all cases and speaker gender in 95% but misclassified 30% of synthetic samples as real, indicating high realism. Our results suggest synthetic speech effectively captures disordered characteristics and that voice cloning has advanced to produce high-quality data resembling real speech, even to trained professionals. This has critical implications for healthcare, where synthetic data can mitigate data scarcity, protect privacy, and enhance AI-driven diagnostics. By enabling the creation of diverse, high-quality speech datasets, voice cloning can improve generalizable models, personalize therapy, and advance assistive technologies for dysarthria. We publicly release our synthetic dataset to foster further research and collaboration, aiming to develop robust models that improve patient outcomes in speech-language pathology.
- Information Technology > Security & Privacy (1.00)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
Zero-Shot vs. Few-Shot Multi-Speaker TTS Using Pre-trained Czech SpeechT5 Model
Lehečka, Jan, Hanzlíček, Zdeněk, Matoušek, Jindřich, Tihelka, Daniel
In this paper, we experimented with the SpeechT5 model pre-trained on large-scale datasets. We pre-trained the foundation model from scratch and fine-tuned it on a large-scale robust multi-speaker text-to-speech (TTS) task. We tested the model capabilities in a zero- and few-shot scenario. Based on two listening tests, we evaluated the synthetic audio quality and the similarity of how synthetic voices resemble real voices. Our results showed that the SpeechT5 model can generate a synthetic voice for any speaker using only one minute of the target speaker's data. We successfully demonstrated the high quality and similarity of our synthetic voices on publicly known Czech politicians and celebrities.
- Europe > Czechia (0.04)
- South America > Colombia > Meta Department > Villavicencio (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Media (0.69)
- Leisure & Entertainment (0.47)
AI Can't Make Music
The first concert I bought tickets to after the pandemic subsided was a performance of the British singer-songwriter Birdy, held last April in Belgium. I've listened to Birdy more than to any other artist; her voice has pulled me through the hardest and happiest stretches of my life. I know every lyric to nearly every song in her discography, but that night Birdy's voice had the same effect as the first time I'd listened to her, through beat-up headphones connected to an iPod over a decade ago--a physical shudder, as if a hand had reached across time and grazed me, somehow, just beneath the skin. Countless people around the world have their own version of this ineffable connection, with Taylor Swift, perhaps, or the Beatles, Bob Marley, or Metallica. My feelings about Birdy's music were powerful enough to propel me across the Atlantic, just as tens of thousands of people flocked to the Sphere to see Phish earlier this year, or some 400,000 went to Woodstock in 1969.
- Europe > Belgium (0.25)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > New York (0.04)
- Europe > Sweden > Uppsala County > Uppsala (0.04)
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
Scarlett Johansson Says OpenAI Ripped Off Her Voice for ChatGPT
Last week OpenAI revealed a new conversational interface for ChatGPT with an expressive synthetic voice strikingly similar to that of the AI assistant played by Scarlett Johansson in the sci-fi movie Her--only to suddenly disable the new voice over the weekend. On Monday, Johansson issued a statement claiming to have forced that reversal, after her lawyers demanded OpenAI clarify how the new voice was created. Johansson's statement, relayed to WIRED by her publicist, claims that OpenAI CEO Sam Altman asked her last September to provide ChatGPT's new voice but that she declined. She describes being astounded to see the company demo a new voice for ChatGPT last week that sounded like her anyway. "When I heard the release demo I was shocked, angered, and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine that my closest friends and news outlets could not tell the difference," the statement reads.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)
Creating an African American-Sounding TTS: Guidelines, Technical Challenges,and Surprising Evaluations
Pinhanez, Claudio, Fernandez, Raul, Grave, Marcelo, Nogima, Julio, Hoory, Ron
This poses challenges for applications interested in targeting specific demographics (e.g., an African American business or NGO; a voice-tutoring system for children that are not of White ethnicity, etc.). The ultimate goal of the project described in this paper is to provide to designers, developers, and enterprises the choice of having a professional voice which is clearly recognizable as African American, and therefore more able to address diversity and inclusiveness issues. Being more precise, our goal is to create an African American Text-to-Speech system, which we will refer simply as an African American voice or AA voice, able to produce synthetic audio segments from standard English texts, and which will be recognized by African American speakers and non-speakers as sounding like a native African American speaker. The AA voice should exhibit a level of technical quality similar to the Standard American English (SAE) synthetic voices currently available through professional platforms. The evaluation of the technical quality of the AA voice, however, is not addressed in this paper, which focuses primarily on whether the AA voice can be recognized as sounding like an African American speaker. Linguists [27, 28] have described a continuum of dialects under what is often termed African American Vernacular English (AAVE). At one end of the spectrum, one finds the largest deviation from SAE in terms of lexicon (including slang), syntax and morphology, and phonological/phonetic properties. At the other end, AAVE speakers begin to approach SAE in terms of lexicon and grammar but still retain marked speech characteristics (primarily in terms of intonation, phonation, and vowel placement [14, 28]) which grant the speech a distinctive identity which listeners use as cues in the perception of African American English [44].
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > South Carolina > Greenville County > Greenville (0.06)
- North America > United States > New York > New York County > New York City (0.04)
- (31 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Questionnaire & Opinion Survey (1.00)
- Asia > China (0.05)
- North America > United States > California > San Mateo County > Menlo Park (0.05)
- Media (1.00)
- Information Technology > Security & Privacy (1.00)
AI generated Joe Rogan podcast stuns social media with 'terrifying' accuracy: 'Mind blowingly dangerous'
The video was made with ChatGPT and is not the actual words of Joe Rogan or Sam Altman. Artificial intelligence chatbot ChatGPT has created a 51-minute episode of The Joe Rogan Experience featuring nearly flawless representations of the podcast host's voice and the voice of OpenAI CEO Sam Altman. The episode begins with an AI-generated Rogan welcoming the audience to the first episode of the "Joe Rogan AI Experience," speaking in a manner and tone that is difficult to distinguish from the real person. "I'm your host, Joe Rogan, or at least that's what this AI model thinks I sound like. Let me tell you, folks, this is some next-level stuff we've got going on here today," faux Rogan continues.
- North America > United States > New York (0.06)
- North America > United States > Nevada > Clark County > Las Vegas (0.05)
- North America > United States > California > Orange County > Laguna Beach (0.05)
'Of course it's disturbing': will AI change Hollywood forever?
What will AI (artificial intelligence) do to Hollywood? Who better to answer that question than ChatGPT, a thrilling but scary chatbot developed by OpenAI. When the Guardian asked it about AI's potential impact on the film industry, it made the following points: Scriptwriting: AI can be used to analyze existing screenplays and create new ones, potentially leading to more efficient and cost-effective screenwriting. Pre-production: AI can be used to streamline the pre-production process, including casting, location scouting and storyboarding. Special effects: AI can be used to create more realistic and immersive special effects, potentially reducing the need for practical effects and saving time and money in post-production.
- North America > United States > California > Los Angeles County > Los Angeles (0.05)
- Europe > Russia > North Caucasian Federal District > Chechen Republic (0.05)
- North America > United States > New York (0.04)
- (4 more...)
- Media > Film (1.00)
- Leisure & Entertainment (1.00)