Overview of the Amphion Toolkit (v0.2)
Li, Jiaqi, Zhang, Xueyao, Wang, Yuancheng, He, Haorui, Wang, Chaoren, Wang, Li, Liao, Huan, Ao, Junyi, Xie, Zeyu, Huang, Yiqiao, Zhang, Junan, Wu, Zhizheng
–arXiv.org Artificial Intelligence
Amphion is an open-source toolkit for Audio, Music, and Speech Generation, designed to lower the entry barrier for junior researchers and engineers in these fields. It provides a versatile framework that supports a variety of generation tasks and models. In this report, we introduce Amphion v0.2, the second major release developed in 2024. This release features a 100K-hour open-source multilingual dataset, a robust data preparation pipeline, and novel models for tasks such as text-to-speech, audio coding, and voice conversion. Furthermore, the report includes multiple tutorials that guide users through the functionalities and usage of the newly released models.
arXiv.org Artificial Intelligence
Feb-11-2025
- Country:
- Europe (0.92)
- North America > United States
- Michigan (0.14)
- Pennsylvania (0.14)
- Genre:
- Research Report > Promising Solution (0.33)
- Industry:
- Law > Government & the Courts (0.67)
- Leisure & Entertainment (0.67)
- Media (1.00)
- Technology:
- Information Technology
- Artificial Intelligence
- Machine Learning > Neural Networks
- Deep Learning (1.00)
- Natural Language
- Chatbot (0.67)
- Large Language Model (1.00)
- Text Processing (1.00)
- Representation & Reasoning (1.00)
- Speech > Speech Recognition (1.00)
- Vision (1.00)
- Machine Learning > Neural Networks
- Data Science (1.00)
- Artificial Intelligence
- Information Technology