AITopics | human voice

Collaborating Authors

human voice

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Xania Monet's music is the stuff of nightmares. Thankfully her AI 'clankers' will be limited to this cultural moment Van Badham

The GuardianNov-20-2025, 22:59:46 GMT

Xania Monet is'a photorealistic digital avatar accompanied by a sound that computers have generated to resemble that of a human voice singing words', writes Van Badham. Xania Monet is'a photorealistic digital avatar accompanied by a sound that computers have generated to resemble that of a human voice singing words', writes Van Badham. Xania Monet's music is the stuff of nightmares. Thankfully her AI'clankers' will be limited to this cultural moment Xania Monet is the latest digital nightmare to emerge from a hellscape of AI content production. The music iteration of AI "actor" Tilly Norwood, Xania is a composite product manufactured of digital tools: in this case, a photorealistic avatar accompanied by a sound that computers have generated to resemble that of a human voice singing words.

artificial intelligence, social media, xania monet, (10 more...)

The Guardian

Country:

Europe (0.51)
North America > United States (0.17)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback

FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal Contextual Fusion

Chen, Shunian, Xie, Xinyuan, Chen, Zheshu, Zhao, Liyan, Lee, Owen, Su, Zhan, Sun, Qilin, Wang, Benyou

arXiv.org Artificial IntelligenceJun-3-2025

High-quality, large-scale audio captioning is crucial for advancing audio understanding, yet current automated methods often generate captions that lack fine-grained detail and contextual accuracy, primarily due to their reliance on limited unimodal or superficial multimodal information. Drawing inspiration from human auditory perception, which adeptly integrates cross-modal cues and performs sophisticated auditory scene analysis, we introduce a novel two-stage automated pipeline. This pipeline first employs specialized pretrained models to extract diverse contextual cues (e.g., speech, music, general sounds, and visual information from associated video). A large language model (LLM) then synthesizes these rich, multimodal inputs to generate detailed and context-aware audio captions. Key contributions of this work include: (1) the proposed scalable method for fine-grained audio caption generation; (2) FusionAudio, a new large-scale dataset comprising 1.2 million such detailed captions, combined with 6 million QA pairs; and (3) enhanced audio models developed using FusionAudio, specifically a CLAP-based audio encoder with superior audio-text alignment and instruction following. This paper paves the way for more nuanced and accurate automated understanding of complex audio environments. Code and data can be found in https://github.com/satsuki2486441738/FusionAudio.

caption, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2506.01111

Country: North America > United States > Minnesota (0.28)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment (0.93)
Media > Music (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

ArVoice: A Multi-Speaker Dataset for Arabic Speech Synthesis

Toyin, Hawau Olamide, Marew, Rufael, Alblooshi, Humaid, Magdy, Samar M., Aldarmaki, Hanan

arXiv.org Artificial IntelligenceMay-28-2025

We introduce ArV oice, a multi-speaker Modern Standard Arabic (MSA) speech corpus with diacritized transcriptions, intended for multi-speaker speech synthesis, and can be useful for other tasks such as speech-based diacritic restoration, voice conversion, and deepfake detection. ArV oice comprises: (1) a new professionally recorded set from six voice talents with diverse demographics, (2) a modified subset of the Arabic Speech Corpus; and (3) high-quality synthetic speech from two commercial systems. The complete corpus consists of a total of 83.52 hours of speech across 11 voices; around 10 hours consist of human voices from 7 speakers. We train three open-source TTS and two voice conversion systems to illustrate the use cases of the dataset. The corpus is available for research use.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2505.20506

Country: Asia > Middle East (0.46)

Genre: Research Report > New Finding (0.47)

Industry: Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.75)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.69)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

Is a Chat with a Bot a Conversation?

The New YorkerSep-30-2024, 10:00:00 GMT

You are at the Princess's ball, and she is telling you a secret, but her orchestra of bears is making such a fearful lot of noise you cannot hear what she is saying. What do you say, dear? I'd lean in closer and say, "Could you repeat that? The bear-itone section is a bit too enthusiastic tonight!" In 1958, the year the illustrated children's book "What Do You Say, Dear?" appeared, the leaders of a field newly dubbed "artificial intelligence" spoke at a conference in Teddington, England, on "The Mechanisation of Thought Processes." Marvin Minsky, of M.I.T., talked about heuristic programming; Alan Turing gave a paper called "Learning Machines"; Grace Hopper assessed the state of computer languages; and scientists from Bell Labs débuted a computer that could synthesize human speech by having it sing "Daisy Bell" ("Daisy, Daisy, give me your answer, do . .

artificial intelligence, machine learning, natural language, (18 more...)

The New Yorker

Country:

Europe > United Kingdom > England (0.24)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Europe > France (0.04)

Industry: Education (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.71)
Information Technology > Artificial Intelligence > History (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

Evaluating and Personalizing User-Perceived Quality of Text-to-Speech Voices for Delivering Mindfulness Meditation with Different Physical Embodiments

Shi, Zhonghao, Chen, Han, Velentza, Anna-Maria, Liu, Siqi, Dennler, Nathaniel, O'Connell, Allison, Matarić, Maja

arXiv.org Artificial IntelligenceJan-7-2024

Mindfulness-based therapies have been shown to be effective in improving mental health, and technology-based methods have the potential to expand the accessibility of these therapies. To enable real-time personalized content generation for mindfulness practice in these methods, high-quality computer-synthesized text-to-speech (TTS) voices are needed to provide verbal guidance and respond to user performance and preferences. However, the user-perceived quality of state-of-the-art TTS voices has not yet been evaluated for administering mindfulness meditation, which requires emotional expressiveness. In addition, work has not yet been done to study the effect of physical embodiment and personalization on the user-perceived quality of TTS voices for mindfulness. To that end, we designed a two-phase human subject study. In Phase 1, an online Mechanical Turk between-subject study (N=471) evaluated 3 (feminine, masculine, child-like) state-of-the-art TTS voices with 2 (feminine, masculine) human therapists' voices in 3 different physical embodiment settings (no agent, conversational agent, socially assistive robot) with remote participants. Building on findings from Phase 1, in Phase 2, an in-person within-subject study (N=94), we used a novel framework we developed for personalizing TTS voices based on user preferences, and evaluated user-perceived quality compared to best-rated non-personalized voices from Phase 1. We found that the best-rated human voice was perceived better than all TTS voices; the emotional expressiveness and naturalness of TTS voices were poorly rated, while users were satisfied with the clarity of TTS voices. Surprisingly, by allowing users to fine-tune TTS voice features, the user-personalized TTS voices could perform almost as well as human voices, suggesting user personalization could be a simple and very effective tool to improve user-perceived quality of TTS voice.

embodiment, participant, tts voice, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3568162.3576987

2401.03581

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.29)
Europe > Sweden > Stockholm > Stockholm (0.05)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)
Research Report > Strength High (0.68)

Industry:

Health & Medicine > Consumer Health (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.86)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(3 more...)

Add feedback

Top 5 AI Voice Generators: Enhancing Your Business With Next-Gen Voice Solutions

#artificialintelligenceMar-29-2023, 13:55:20 GMT

Artificial intelligence (AI) has been advancing rapidly in recent years, and one area where it has made significant progress is in the generation of human-like voices. AI voice generators have emerged as game-changer. An AI voice generator is a type of software or technology that uses artificial intelligence (AI) algorithms to produce synthesized speech that sounds like a human voice. These generators, also known as text-to-speech (TTS) engines, convert written text into spoken words. These tools save time and money and offer a diverse range of realistic and human-like voices for various applications.

ai voice generator, application, human voice, (11 more...)

#artificialintelligence

Industry:

Leisure & Entertainment (0.55)
Media (0.34)

Technology: Information Technology > Artificial Intelligence > Vision (0.40)

Add feedback

Microsoft's new VALL-E AI can capture your voice in 3 seconds

#artificialintelligenceJan-11-2023, 06:37:48 GMT

Microsoft researchers have presented an impressive new text-to-speech AI model, called Vall-E, which can listen to a voice for just a few seconds, then mimic that voice – including the emotional tone and acoustics – to say whatever you like. It's the latest of many AI algorithms that can harness a recording of a person's voice and make it say words and sentences that person never spoke – and it's remarkable for just how small a scrap of audio it needs in order to extrapolate an entire human voice. Where 2017's Lyrebird algorithm from the University of Montreal, for example, needed a full minute of speech to analyze, Vall-E needs just a three-second audio snippet. The AI has been trained on some 60,000 hours of English speech – mainly, it seems, by audiobook narrators, and the researchers have presented a swag of samples, in which Vall-E attempts to puppeteer a range of human voices. Some do a pretty extraordinary job of capturing the essence of the voice and building new sentences that sound natural – you'd struggle to tell which was the real voice and which was the synthesis. In others, the only giveaway is when the AI puts the emphasis in strange places in the sentence.

artificial intelligence, narrator, speech synthesis, (3 more...)

#artificialintelligence

Country: North America > Canada > Quebec > Montreal (0.25)

Industry: Media (0.38)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.71)

Add feedback

James Earl Jones done as Darth Vader, but his voice will live on because of AI

#artificialintelligenceOct-15-2022, 02:55:16 GMT

"Luke, I am your father" are five of the most famous words ever spoken on screen. When Darth Vader shattered Luke Skywalker's world in "The Empire Strikes Back," he sent shivers down the spines of audiences everywhere--in large part because of actor James Earl Jones' famous baritone. Now, Jones, 91, has announced he is hanging up the mask and retiring as the voice of one of the most infamous cinematic villains. But don't despair: Although Jones will no longer record new lines for Star Wars projects, the character--and Jones' voice--will live on thanks to artificial intelligence. As first reported by Vanity Fair, Respeecher, a Ukrainian voice synthesis company, will use a combination of archival recordings, voice acting and AI technology to continue bringing Darth Vader to the screen.

darth vader, james earl jones, kleinberger, (8 more...)

#artificialintelligence

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Speech (0.30)

Add feedback

Parkinson's and CANCER can be picked up in your VOICE with new app under development

Daily Mail - Science & techOct-11-2022, 20:22:27 GMT

A mobile app may soon be able to diagnose you with chronic health conditions using the sound of your voice. Scientists are building an artificial intelligence that analyzes vibrations in speech and breathing patterns to look for clues for illness. The National Institutes of Health is funding a mammoth research project to collect voice data that will build the AI. Experts already know that speech is altered by conditions like Parkinson's or stroke, while breathing is affected by lung diseases. But the hope is that the computer program will be able to diagnose a wide range of conditions - including cancer and depression.

artificial intelligence, disorder, parkinson, (13 more...)

Daily Mail - Science & tech

Country: North America > United States > Florida (0.09)

Industry:

Health & Medicine > Therapeutic Area > Oncology (0.71)
Health & Medicine > Health Care Technology > Telehealth (0.57)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Horses and pigs can distinguish between negative and positive sounds in human speech

Daily Mail - Science & techMay-24-2022, 11:15:43 GMT

From'Babe' to'Black Beauty', popular culture is constantly telling us that speaking to animals gently and'politely' is the best way to get them to do our bidding. Now a new study has shown the same is true in the real world, as domesticated animals like pigs and horses can tell the difference between negative and positive sounds in human speech. Researchers from the University of Copenhagen's Department of Biology and ETH Zurich found that the animals reacted react more strongly to'negatively charged' human voices. In some cases they even seemed to mirror the emotion expressed in the human voice, according to the researchers. 'That'll do, pig': The findings in the study backs up teachings in films like'Babe' where characters speak politely to their furry companions The stallion in'Black Beauty' goes through many good and bad owners, and researchers have found that this experience could have bearing on the wellbeing of real-life horses Researchers concluded that it is most likely that horses may be able to perceive and interpret each other's sounds by virtue of their common biology.

emotion, human speech, human voice, (16 more...)

Daily Mail - Science & tech

Country:

Europe > Denmark > Capital Region > Copenhagen (0.26)
Europe > Switzerland > Zürich > Zürich (0.25)
Europe > France (0.05)

Genre: Research Report > New Finding (0.31)

Technology: Information Technology > Artificial Intelligence > Cognitive Science (0.33)

Add feedback