AITopics | Wang, Chaoren

Collaborating Authors

Wang, Chaoren

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Overview of the Amphion Toolkit (v0.2)

Li, Jiaqi, Zhang, Xueyao, Wang, Yuancheng, He, Haorui, Wang, Chaoren, Wang, Li, Liao, Huan, Ao, Junyi, Xie, Zeyu, Huang, Yiqiao, Zhang, Junan, Wu, Zhizheng

arXiv.org Artificial IntelligenceFeb-11-2025

Amphion is an open-source toolkit for Audio, Music, and Speech Generation, designed to lower the entry barrier for junior researchers and engineers in these fields. It provides a versatile framework that supports a variety of generation tasks and models. In this report, we introduce Amphion v0.2, the second major release developed in 2024. This release features a 100K-hour open-source multilingual dataset, a robust data preparation pipeline, and novel models for tasks such as text-to-speech, audio coding, and voice conversion. Furthermore, the report includes multiple tutorials that guide users through the functionalities and usage of the newly released models.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2501.15442

Country:

Europe (0.92)
North America > United States > Pennsylvania (0.14)
North America > United States > Michigan (0.14)

Genre: Research Report > Promising Solution (0.33)

Industry:

Media (1.00)
Law > Government & the Courts (0.67)
Leisure & Entertainment (0.67)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
(5 more...)

Add feedback

Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation

He, Haorui, Shang, Zengqiang, Wang, Chaoren, Li, Xuyuan, Gu, Yicheng, Hua, Hua, Liu, Liwei, Yang, Chen, Li, Jiaqi, Shi, Peiyang, Wang, Yuancheng, Chen, Kai, Zhang, Pengyuan, Wu, Zhizheng

arXiv.org Artificial IntelligenceJan-27-2025

Recent advancements in speech generation have been driven by the large-scale training datasets. However, current models fall short of capturing the spontaneity and variability inherent in real-world human speech, due to their reliance on audiobook datasets limited to formal read-aloud speech styles. To bridge this gap, we introduce Emilia-Pipe, an open-source preprocessing pipeline to extract high-quality training data from valuable yet underexplored in-the-wild data that capture spontaneous human speech in real-world contexts. By leveraging Emilia-Pipe, we construct Emilia, the first multilingual speech generation dataset derived from in-the-wild speech data. This dataset comprises over 101k hours of speech across six languages: English, Chinese, German, French, Japanese, and Korean. Besides, we expand Emilia to Emilia-Large, a dataset exceeding 216k hours, making it the largest open-source speech generation dataset available. Extensive experiments demonstrate that Emilia significantly outperforms traditional audiobook datasets in generating spontaneous and human-like speech, showcasing superior performance in capturing diverse speaker timbre and speaking styles of real-world human speech. Furthermore, this work underscores the importance of scaling dataset size to advance speech generation research and validates the effectiveness of Emilia for both multilingual and crosslingual speech generation.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2501.15907

Country: Asia > China (0.68)

Genre: Research Report (0.82)

Industry: Media (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.68)

Add feedback

Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation

He, Haorui, Shang, Zengqiang, Wang, Chaoren, Li, Xuyuan, Gu, Yicheng, Hua, Hua, Liu, Liwei, Yang, Chen, Li, Jiaqi, Shi, Peiyang, Wang, Yuancheng, Chen, Kai, Zhang, Pengyuan, Wu, Zhizheng

arXiv.org Artificial IntelligenceJul-12-2024

Recently, speech generation models have made significant progress by using large-scale training data. However, the research community struggle to produce highly spontaneous and human-like speech due to the lack of large-scale, diverse, and spontaneous speech data. This paper present Emilia, the first multilingual speech generation dataset from in-the-wild speech data, and Emilia-Pipe, the first open-source preprocessing pipeline designed to transform in-the-wild speech data into high-quality training data with annotations for speech generation. Emilia starts with over 101k hours of speech in six languages and features diverse speech with varied speaking styles. To facilitate the scale-up of Emilia, the open-source pipeline Emilia-Pipe can process one hour of raw speech data ready for model training in a few mins, which enables the research community to collaborate on large-scale speech generation research. Experimental results validate the effectiveness of Emilia. Demos are available at: https://emilia-dataset.github.io/Emilia-Demo-Page/.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2407.05361

Country: Asia > China (0.69)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)

Add feedback