AITopics | Xia, Kangxiang

Collaborating Authors

Xia, Kangxiang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The NPU-HWC System for the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge

Guo, Dake, Yao, Jixun, Zhu, Xinfa, Xia, Kangxiang, Guo, Zhao, Zhang, Ziyu, Wang, Yao, Liu, Jie, Xie, Lei

arXiv.org Artificial IntelligenceOct-31-2024

This paper presents the NPU-HWC system submitted to the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge 2024 (ICAGC). Our system consists of two modules: a speech generator for Track 1 and a background audio generator for Track 2. In Track 1, we employ Single-Codec to tokenize the speech into discrete tokens and use a language-model-based approach to achieve zero-shot speaking style cloning. The Single-Codec effectively decouples timbre and speaking style at the token level, reducing the acoustic modeling burden on the autoregressive language model. Additionally, we use DSPGAN to upsample 16 kHz mel-spectrograms to high-fidelity 48 kHz waveforms. In Track 2, we propose a background audio generator based on large language models (LLMs). This system produces scene-appropriate accompaniment descriptions, synthesizes background audio with Tango 2, and integrates it with the speech generated by our Track 1 system. Our submission achieves the second place and the first place in Track 1 and Track 2 respectively.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.23815

Country: Asia > China (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

The ISCSLP 2024 Conversational Voice Clone (CoVoC) Challenge: Tasks, Results and Findings

Xia, Kangxiang, Guo, Dake, Yao, Jixun, Xue, Liumeng, Li, Hanzhao, Wang, Shuai, Guo, Zhao, Xie, Lei, Zhang, Qingqing, Luo, Lei, Dong, Minghui, Sun, Peng

arXiv.org Artificial IntelligenceOct-31-2024

The ISCSLP 2024 Conversational Voice Clone (CoVoC) Challenge aims to benchmark and advance zero-shot spontaneous style voice cloning, particularly focusing on generating spontaneous behaviors in conversational speech. The challenge comprises two tracks: an unconstrained track without limitation on data and model usage, and a constrained track only allowing the use of constrained open-source datasets. A 100-hour high-quality conversational speech dataset is also made available with the challenge. This paper details the data, tracks, submitted systems, evaluation results, and findings.

artificial intelligence, machine learning, speech, (16 more...)

arXiv.org Artificial Intelligence

2411.00064

Country: Asia > China (0.47)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.92)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback