AITopics | text-to-speech model

Collaborating Authors

text-to-speech model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

4730d10b22261faa9a95ebf7497bc556-Paper-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 16:55:05 GMT

arxiv preprint arxiv, generspeech, representation, (13 more...)

Neural Information Processing Systems

Country:

Asia > China (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.77)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.65)

Add feedback

GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech

Neural Information Processing SystemsAug-14-2025, 14:18:20 GMT

This paper proposes GenerSpeech, a text-to-speech model towards high-fidelity zero-shot style transfer of OOD custom voice.

arxiv preprint arxiv, generspeech, representation, (13 more...)

Neural Information Processing Systems

Country:

Asia > China (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.77)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.65)

Add feedback

Aligner-Guided Training Paradigm: Advancing Text-to-Speech Models with Aligner Guided Duration

Lou, Haowei, Paik, Helen, Hu, Wen, Yao, Lina

arXiv.org Artificial IntelligenceDec-11-2024

Recent advancements in text-to-speech (TTS) systems, such as FastSpeech and StyleSpeech, have significantly improved speech generation quality. However, these models often rely on duration generated by external tools like the Montreal Forced Aligner, which can be time-consuming and lack flexibility. The importance of accurate duration is often underestimated, despite their crucial role in achieving natural prosody and intelligibility. To address these limitations, we propose a novel Aligner-Guided Training Paradigm that prioritizes accurate duration labelling by training an aligner before the TTS model. This approach reduces dependence on external tools and enhances alignment accuracy. We further explore the impact of different acoustic features, including Mel-Spectrograms, MFCCs, and latent features, on TTS model performance. Our experimental results show that aligner-guided duration labelling can achieve up to a 16\% improvement in word error rate and significantly enhance phoneme and tone alignment. These findings highlight the effectiveness of our approach in optimizing TTS systems for more natural and intelligible speech generation.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2412.08112

Country:

North America > Canada > Quebec > Montreal (0.25)
Oceania > Australia > New South Wales > Kensington (0.05)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Now you can chat with ChatGPT using your voice

MIT Technology ReviewSep-25-2023, 12:00:11 GMT

In a demo the company gave me last week, Joanne Jang, a product manager, showed off ChatGPT's range of synthetic voices. These were created by training the text-to-speech model on the voices of actors that OpenAI had hired. In the future it might even allow users to create their own voices. "In fashioning the voices, the number-one criterion was whether this is a voice you could listen to all day," she says. They are chatty and enthusiastic but won't be to everyone's taste.

chatgpt, openai, text-to-speech model, (2 more...)

MIT Technology Review

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.55)

Add feedback

Artificial Intelligence brings Steve Jobs "back to life"

#artificialintelligenceJan-6-2023, 05:30:28 GMT

Joe Rogan did a unique and impressive interview with Steve Jobs on podcast.ai, The founder of Apple passed away in 2011, but that didn't stop Joe Rogan from bringing his voice back to life using Artificial Intelligence. "On this episode, I welcome a friend who is difficult to describe. I am fascinated by him, and I hope you will be too. He is weird and brilliant and something insufferable. But my guest today has made some of the best technological products of our age, and he is always pushing the envelope in innovation," Joe Rogan said in his introduction during the podcast.

artificial intelligence bring steve jobs, joe rogan, steve jobs, (9 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Software Translates Your Voice into Another Language

AITopics Original LinksJan-18-2017, 12:02:57 GMT

Researchers at Microsoft have made software that can learn the sound of your voice, and then use it to speak a language that you don't. The system could be used to make language tutoring software more personal, or to make tools for travelers. In a demonstration at Microsoft's Redmond, Washington, campus on Tuesday, Microsoft research scientist Frank Soong showed how his software could read out text in Spanish using the voice of his boss, Rick Rashid, who leads Microsoft's research efforts. In a second demonstration, Soong used his software to grant Craig Mundie, Microsoft's chief research and strategy officer, the ability to speak Mandarin. Hear Rick Rashid's voice in his native language and then translated into several other languages: English: Listen to a clip of Rick Rashid talking normally.

artificial intelligence, software, soong, (13 more...)

AITopics Original Links

Country:

North America > United States > Washington > King County > Redmond (0.26)
Asia > China > Beijing > Beijing (0.07)
North America > United States > California > Los Angeles County > Los Angeles (0.06)

Technology: Information Technology > Artificial Intelligence > Speech (0.35)

Add feedback