audiobook
Extremists are using AI voice cloning to supercharge propaganda. Experts say it's helping them grow
'Extremist movements are using voice-generating bots to recreate the voices and speeches of major figures in their milieu.' 'Extremist movements are using voice-generating bots to recreate the voices and speeches of major figures in their milieu.' Extremists are using AI voice cloning to supercharge propaganda. Experts say it's helping them grow W hile the artificial intelligence boom is upending sections of the music industry, voice generating bots are also becoming a boon to another unlikely corner of the internet: extremist movements that are using them to recreate the voices and speeches of major figures in their milieu, and experts say it is helping them grow. "The adoption of AI-enabled translation by terrorists and extremists marks a significant evolution in digital propaganda strategies," said Lucas Webber, a senior threat intelligence analyst at Tech Against Terrorism and a research fellow at the Soufan Center.
- North America > United States (0.51)
- Europe > Ukraine (0.06)
- Oceania > Australia (0.05)
- Media (1.00)
- Law Enforcement & Public Safety > Terrorism (1.00)
- Government > Military (1.00)
- Government > Regional Government > North America Government > United States Government (0.51)
- North America > United States > Tennessee (0.06)
- South America > Venezuela (0.05)
- North America > United States > New York (0.04)
- (3 more...)
- Media (1.00)
- Leisure & Entertainment > Sports (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
ParsVoice: A Large-Scale Multi-Speaker Persian Speech Corpus for Text-to-Speech Synthesis
Kalahroodi, Mohammad Javad Ranjbar, Faili, Heshaam, Shakery, Azadeh
Existing Persian speech datasets are typically smaller than their English counterparts, which creates a key limitation for developing Persian speech technologies. We address this gap by introducing ParsVoice, the largest Persian speech corpus designed specifically for text-to-speech(TTS) applications. We created an automated pipeline that transforms raw audiobook content into TTS-ready data, incorporating components such as a BERT-based sentence completion detector, a binary search boundary optimization method for precise audio-text alignment, and audio-text quality assessment frameworks tailored to Persian. The pipeline processes 2,000 audiobooks, yielding 3,526 hours of clean speech, which was further filtered into a 1,804-hour high-quality subset suitable for TTS, featuring more than 470 speakers. To validate the dataset, we fine-tuned XTTS for Persian, achieving a naturalness Mean Opinion Score (MOS) of 3.6/5 and a Speaker Similarity Mean Opinion Score (SMOS) of 4.0/5 demonstrating ParsVoice's effectiveness for training multi-speaker TTS systems. ParsVoice is the largest high-quality Persian speech dataset, offering speaker diversity and audio quality comparable to major English corpora. The complete dataset has been made publicly available to accelerate the development of Persian speech technologies. The ParsVoice dataset is publicly available at: https://huggingface.co/datasets/MohammadJRanjbar/ParsVoice.
Is reading always better for your brain than listening to audiobooks?
Is reading always better for your brain than listening to audiobooks? Reading books and listening to audiobooks tap into different elements of cognition, each with their own benefits. So which one should you choose, and when? But when a friend recently asked me whether her daughter was getting the same cognitive benefits from an audiobook as she would from reading, my instinct was to think "she's enjoying a book, the format doesn't matter". However, when I dug into the science, I found the medium does shape the mind in subtly different but meaningful ways.
- North America > United States > Virginia (0.05)
- North America > United States > North Dakota (0.05)
- Europe > United Kingdom > England > Devon > Exeter (0.05)
- (2 more...)
- Media > Publishing (1.00)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.48)
'AI doesn't know what an orgasm sounds like': audiobook actors grapple with the rise of robot narrators
When we think about what makes an audiobook memorable, it's always the most human moments: a catch in the throat when tears are near, or words spoken through a real smile. A Melbourne actor and audiobook narrator, Annabelle Tudor, says it's the instinct we have as storytellers that makes narration such a primal, and precious, skill. "The voice betrays how we're feeling really easily," she says. But as an art form it may be under threat. In May the Amazon-owned audiobook provider Audible announced it would allow authors and publishers to choose from more than 100 voices created by artificial intelligence to narrate audiobooks in English, Spanish, French and Italian, with AI translation of audiobooks expected to be available later in the year – news that was met with criticism and curiosity across the publishing industry.
- Oceania > Australia (0.07)
- North America > United States (0.05)
- Europe > United Kingdom (0.05)
- (2 more...)
Fox News AI Newsletter: Expert warns just 20 cloud images can make an AI deepfake video of your child
Texas high school student Elliston Berry joins'Fox & Friends' to discuss the House's passage of a new bill that criminalizes the sharing of non-consensual intimate images, including content created with artificial intelligence. Welcome to Fox News' Artificial Intelligence newsletter with the latest AI technology advancements. IN TODAY'S NEWSLETTER: - Peek-a-boo, big tech sees you: Expert warns just 20 cloud images can make an AI deepfake video of your child - 5 AI terms you keep hearing and what they actually mean - AI to monitor NYC subway safety as crime concerns rise First Lady Melania Trump, joined by U.S. President Donald Trump, delivers remarks before President Trump signed the TAKE IT DOWN Act into law in the Rose Garden of the White House on May 19, 2025 in Washington, DC. The first lady made the Tools to Address Known Exploitation by Immobilizing Technological Deepfakes on Websites and Networks (TAKE IT DOWN) Act a priority, traveling to Capitol Hill to lobby lawmakers and show her support for the legislation, which addresses non-consensual intimate imagery, or "revenge porn," and artificial intelligence deepfakes posted online and to social media. DEEPFAKE DANGERS: Parents love capturing their kids' big moments, from first steps to birthday candles.
- North America > United States > Texas (0.26)
- North America > United States > District of Columbia > Washington (0.26)
AI Melania: First lady embarks on 'new frontier' in publishing with audiobook of memoir
EXCLUSIVE: First lady Melania Trump is launching an audiobook of her memoir using artificial intelligence (AI) audio technology in multiple languages, Fox News Digital has learned. The first lady released her first memoir, "Melania," last year. This week, she is breaking new ground by releasing "Melania, the Audiobook," which has been "created entirely" with AI. "I am proud to be at the forefront of publishing's new frontier – the intersection of artificial intelligence technology and audio," Trump told Fox News Digital. The first lady said ElevenLabs AI developed "an AI-generated replica of my voice under strict supervision, which will establish an unforgettable connection with my personal story, in multiple languages for listeners worldwide." ElevenLabs AI CEO Mati Staniszewski told Fox News Digital that they are "excited that Melania Trump trusted our technology to power this first-of-its-kind audiobook project."
- Media > Publishing (1.00)
- Government (1.00)
- Media > News (0.89)
MultiActor-Audiobook: Zero-Shot Audiobook Generation with Faces and Voices of Multiple Speakers
Park, Kyeongman, Joo, Seongho, Jung, Kyomin
We introduce MultiActor-Audiobook, a zero-shot approach for generating audiobooks that automatically produces consistent, expressive, and speaker-appropriate prosody, including intonation and emotion. Previous audiobook systems have several limitations: they require users to manually configure the speaker's prosody, read each sentence with a monotonic tone compared to voice actors, or rely on costly training. However, our MultiActor-Audiobook addresses these issues by introducing two novel processes: (1) MSP (**Multimodal Speaker Persona Generation**) and (2) LSI (**LLM-based Script Instruction Generation**). With these two processes, MultiActor-Audiobook can generate more emotionally expressive audiobooks with a consistent speaker prosody without additional training. We compare our system with commercial products, through human and MLLM evaluations, achieving competitive results. Furthermore, we demonstrate the effectiveness of MSP and LSI through ablation studies.
Traveling abroad soon? Learn a language quickly with these 4 apps
These apps let you choose from over a hundred different languages. Traveling to another country is an exciting experience, but learning a new language in order to do so can be a challenge. Fitting lessons into your schedule is difficult and getting the right pronunciation down is always a struggle. With language learning apps like Babbel, Rosetta Stone, Beelinguap and uTalk, you can learn a language at your own pace. These apps have hundreds of languages to choose from, and each app has a different approach and teaches a language differently.
- Europe (0.06)
- North America > United States (0.05)
Methods to Increase the Amount of Data for Speech Recognition for Low Resource Languages
Ayrapetyan, Alexan, Kostandian, Sofia, Yeroyan, Ara, Yerznkanyan, Mher, Karpov, Nikolay, Tadevosyan, Nune, Lavrukhin, Vitaly, Ginsburg, Boris
This study explores methods to increase data volume for low-resource languages using techniques such as crowdsourcing, pseudo-labeling, advanced data preprocessing and various permissive data sources such as audiobooks, Common Voice, YouTube. While these methods are well-explored for highresource languages, their application for low-resource languages remains underexplored. Using Armenian and Georgian as case studies, we demonstrate how linguistic and resource-specific characteristics influence the success of these methods. This work provides practical guidance for researchers to choose cost-effective and quality-driven dataset extension strategies for low-resource languages. The key takeaway from various data extension approaches is that paid crowd-sourcing offers the best balance between cost and quality, outperforming volunteer crowd-sourcing, open-source audiobooks, and unlabeled data usage. Ablation study shows that models trained on the expanded datasets outperform existing baselines and achieve 5.73% for Gergian and 9.9% for Armenian ASR word error rate using a relatively small FastConformer architecture. We open-sourced both the Armenian and Georgian models to allow further research and practical applications.