Goto

Collaborating Authors

 voice data


Cost Analysis of Human-corrected Transcription for Predominately Oral Languages

Diarra, Yacouba, Coulibaly, Nouhoum Souleymane, Leventhal, Michael

arXiv.org Artificial Intelligence

Creating speech datasets for low-resource languages is a critical yet poorly understood challenge, particularly regarding the actual cost in human labor. This paper investigates the time and complexity required to produce high-quality annotated speech data for a subset of low-resource languages, low literacy Predomi-nately Oral Languages, focusing on Bambara, a Manding language of Mali. Through a one-month field study involving ten transcribers with native proficiency, we analyze the correction of ASR-generated transcriptions of 53 hours of Bambara voice data. We report that it takes, on average, 30 hours of human labor to accurately transcribe one hour of speech data under laboratory conditions and 36 hours under field conditions. The study provides a baseline and practical insights for a large class of languages with comparable profiles undertaking the creation of NLP resources.


PRAC3 (Privacy, Reputation, Accountability, Consent, Credit, Compensation): Long Tailed Risks of Voice Actors in AI Data-Economy

Sharma, Tanusree, Zhou, Yihao, Berisha, Visar

arXiv.org Artificial Intelligence

Early large-scale audio datasets, such as LibriSpeech, were built with hundreds of individual contributors whose voices were instrumental in the development of speech technologies, including audiobooks and voice assistants. Y et, a decade later, these same contributions have exposed voice actors to a range of risks. While existing ethical frameworks emphasize Consent, Credit, and Compensation (C), they do not adequately address the emergent risks involving vocal identities that are increasingly decoupled from context, authorship, and control. Drawing on qualitative interviews with 20 professional voice actors, this paper reveals how synthetic replication of voice without clear provenance or enforceable constraints exposes individuals to both reputational and security threats. Beyond reputational harm, such as re-purposing voice data in erotic content, offensive political messaging, and meme culture, we document concerns about accountability breakdowns when their voice is leveraged to clone voices that are deployed in high-stakes scenarios such as financial fraud, misinformation campaigns, or impersonation scams. In such cases, actors face social and legal fallout without recourse, while very few of them have a legal representative or union protection. To make sense of these shifting dynamics, we introduce the PRAC framework - an expansion of C that foregrounds Privacy, Reputation, Accountability, Consent, Credit, and Compensation as interdependent pillars of data used in the synthetic voice economy. This framework captures how privacy risks are amplified through non-consensual training, how reputational harm arises from decontextualized deployment, and how accountability can be reimagined AI Data ecosystems. We argue that voice, as both a biometric identifier and creative labor, demands governance models that restore creator agency, ensure traceability, and establish enforceable boundaries for ethical reuse.


Experts reveal sneaky way your phone listens in on your conversations - and how to stop it

Daily Mail - Science & tech

It was long thought to be a myth and dismissed by big tech companies. But experts have revealed how listening into your conversations has become a multi-billion dollar industry. Earlier this week, a leak from a leading marketing firm appeared to confirm how companies use microphones on your smart devices to eavesdrop before selling the data to advertisers. 'You can be talking to one of your friends about going on a vacation to Portugal through a phone call, and then a day later or that same day, what do you see? An advertisement for a trip,' data security expert Andy LoCascio told DailyMail.com.


Your bank wants your voice. Just say no.

FOX News

Silicon Valley tech pioneer Allison Huynh joined'Fox & Friends First' to discuss her take on the worldwide outages and why she believes Biden could only have'hours' left in the 2024 race. You already gave your bank your address, date of birth, Social Security number and your mother's maiden name. Now, they want your voice. Banks say it's an extra layer of biometric protection against fraud and cybercrime. But with the rise of hackers stealing voice data for deepfakes, is it worth the risk?


Smartwatch-derived Acoustic Markers for Deficits in Cognitively Relevant Everyday Functioning

Yamada, Yasunori, Shinkawa, Kaoru, Kobayashi, Masatomo, Nemoto, Miyuki, Ota, Miho, Nemoto, Kiyotaka, Arai, Tetsuaki

arXiv.org Artificial Intelligence

Detection of subtle deficits in everyday functioning due to cognitive impairment is important for early detection of neurodegenerative diseases, particularly Alzheimer's disease. However, current standards for assessment of everyday functioning are based on qualitative, subjective ratings. Speech has been shown to provide good objective markers for cognitive impairments, but the association with cognition-relevant everyday functioning remains uninvestigated. In this study, we demonstrate the feasibility of using a smartwatch-based application to collect acoustic features as objective markers for detecting deficits in everyday functioning. We collected voice data during the performance of cognitive tasks and daily conversation, as possible application scenarios, from 54 older adults, along with a measure of everyday functioning. Machine learning models using acoustic features could detect individuals with deficits in everyday functioning with up to 77.8% accuracy, which was higher than the 68.5% accuracy with standard neuropsychological tests. We also identified common acoustic features for robustly discriminating deficits in everyday functioning across both types of voice data (cognitive tasks and daily conversation). Our results suggest that common acoustic features extracted from different types of voice data can be used as markers for deficits in everyday functioning.


Early Warning: Changes in Speech May Be the First Sign of Parkinson's Disease

#artificialintelligence

Parkinson's disease is a progressive nervous system disorder that affects movement and muscle control. Lithuanian researchers from Kaunas University of Technology (KTU) utilized AI to identify the early signs of Parkinson's disease using voice data. The diagnosis of Parkinson's disease has shaken many lives, with over 10 million people currently living with the condition. Although there is no cure, early detection of symptoms can lead to better management of the disease. As the disease progresses, changes in speech can occur alongside other symptoms.


Talk to me: How AI can diagnose disease - POLITICO

#artificialintelligence

EXPRESSING A DISEASE: Want to know whether you have Covid-19 or even Alzheimer's? Artificial intelligence might soon have an answer just by listening to your voice. Leading researchers are developing technology that sorts through evidence of so-called vocal biomarkers to hone in on medical conditions that might not be detectable during routine office visits or exams. "This line might seem to have been lifted from a Star Trek script," said Bertalan Meskó, director of the Medical Futurist Institute. "But we are close to having such conversations with our computers."


Addressing the Selection Bias in Voice Assistance: Training Voice Assistance Model in Python with Equal Data Selection

Piya, Kashav, Shrestha, Srijal, Frank, Cameran, Jebessa, Estephanos, Mohd, Tauheed Khan

arXiv.org Artificial Intelligence

In recent times, voice assistants have become a part of our day-to-day lives, allowing information retrieval by voice synthesis, voice recognition, and natural language processing. These voice assistants can be found in many modern-day devices such as Apple, Amazon, Google, and Samsung. This project is primarily focused on Virtual Assistance in Natural Language Processing. Natural Language Processing is a form of AI that helps machines understand people and create feedback loops. This project will use deep learning to create a Voice Recognizer and use Commonvoice and data collected from the local community for model training using Google Colaboratory. After recognizing a command, the AI assistant will be able to perform the most suitable actions and then give a response. The motivation for this project comes from the race and gender bias that exists in many virtual assistants. The computer industry is primarily dominated by the male gender, and because of this, many of the products produced do not regard women. This bias has an impact on natural language processing. This project will be utilizing various open-source projects to implement machine learning algorithms and train the assistant algorithm to recognize different types of voices, accents, and dialects. Through this project, the goal to use voice data from underrepresented groups to build a voice assistant that can recognize voices regardless of gender, race, or accent. Increasing the representation of women in the computer industry is important for the future of the industry. By representing women in the initial study of voice assistants, it can be shown that females play a vital role in the development of this technology. In line with related work, this project will use first-hand data from the college population and middle-aged adults to train voice assistant to combat gender bias.


The Race to Hide Your Voice

WIRED

Your voice reveals more about you than you realize. To the human ear, your voice can instantly give away your mood, for example--it's easy to tell if you're excited or upset. But machines can learn a lot more: inferring your age, gender, ethnicity, socio-economic status, health conditions, and beyond. Researchers have even been able to generate images of faces based on the information contained in individuals' voice data. As machines become better at understanding you through your voice, companies are cashing in.


Deepdub closes fresh round for dubbing AI that dubs movies, shows, and games - Dataconomy

#artificialintelligence

Dubbing, where recordings in other languages are lip-synced and mixed with a show's original soundtrack, is an exploding business. One localization platform, Zoo Digital, saw revenues jump by 73% to $28.6 million in July 2018 compared to the year prior. Another, BTI Studios, told Television Business International that dubbing grew from 3% of its revenue in 2010 to 61% in 2019. According to Verified Market Research, the film dubbing market alone could reach $3.6 billion in worth by 2027, growing at a compound annual growth rate of 5.6% from 2020. But barriers stand in the way of expansion.