Training-Free Voice Conversion with Factorized Optimal Transport

Lobashev, Alexander, Yermekova, Assel, Larchenko, Maria

Jun-12-2025–arXiv.org Artificial Intelligence

This paper introduces Factorized MKL-VC, a training-free modification for kNN-VC pipeline. In contrast with original pipeline, our algorithm performs high quality any-to-any cross-lingual voice conversion with only 5 second of reference audio. MKL-VC replaces kNN regression with a factorized optimal transport map in WavLM embedding subspaces, derived from Monge-Kantorovich Linear solution. Factorization addresses non-uniform variance across dimensions, ensuring effective feature transformation. Experiments on LibriSpeech and FLEURS datasets show MKL-VC significantly improves content preservation and robustness with short reference audio, outperforming kNN-VC. MKL-VC achieves performance comparable to FACodec, especially in cross-lingual voice conversion domain.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Jun-12-2025

arXiv.org PDF

Add feedback

Country:
- Asia (0.68)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Speech (0.69)
  - Natural Language (0.69)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found