AITopics | zero-shot singing voice conversion

Collaborating Authors

zero-shot singing voice conversion

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

HQ-SVC: Towards High-Quality Zero-Shot Singing Voice Conversion in Low-Resource Scenarios

Bai, Bingsong, Geng, Yizhong, Wang, Fengping, Wang, Cong, Guo, Puyuan, Gao, Yingming, Li, Ya

arXiv.org Artificial IntelligenceNov-18-2025

Zero-shot singing voice conversion (SVC) transforms a source singer's timbre to an unseen target speaker's voice while preserving melodic content without fine-tuning. Existing methods model speaker timbre and vocal content separately, losing essential acoustic information that degrades output quality while requiring significant computational resources. To overcome these limitations, we propose HQ-SVC, an efficient framework for high-quality zero-shot SVC. HQ-SVC first extracts jointly content and speaker features using a decoupled codec. It then enhances fidelity through pitch and volume modeling, preserving critical acoustic information typically lost in separate modeling approaches, and progressively refines outputs via differentiable signal processing and diffusion techniques. Evaluations confirm HQ-SVC significantly outperforms state-of-the-art zero-shot SVC methods in conversion quality and efficiency. Beyond voice conversion, HQ-SVC achieves superior voice naturalness compared to specialized audio super-resolution methods while natively supporting voice super-resolution tasks.

large language model, machine learning, voice conversion, (16 more...)

arXiv.org Artificial Intelligence

2511.08496

Country: Asia (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

SaMoye: Zero-shot Singing Voice Conversion Based on Feature Disentanglement and Synthesis

Wang, Zihao, Ma, Le, Liu, Yan, Zhang, Kejun

arXiv.org Artificial IntelligenceJul-10-2024

Singing voice conversion (SVC) aims to convert a singer's voice in a given music piece to another singer while keeping the original content. We propose an end-to-end feature disentanglement-based model, which we named SaMoye, to enable zero-shot many-to-many singing voice conversion. SaMoye disentangles the features of the singing voice into content features, timbre features, and pitch features respectively. The content features are enhanced using a GPT-based model to perform cross-prediction with the phoneme of the lyrics. SaMoye can generate the music with converted voice by replacing the timbre features with the target singer. We also establish an unparalleled large-scale dataset to guarantee zero-shot performance. The dataset consists of 1500k pure singing vocal clips containing at least 10,000 singers.

feature disentanglement and synthesis, voice conversion, zero-shot singing voice conversion, (9 more...)

arXiv.org Artificial Intelligence

2407.07728

Country:

Asia > China > Zhejiang Province > Hangzhou (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.41)

Industry:

Media > Music (0.47)
Leisure & Entertainment (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech (0.98)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.86)

Add feedback