Improving Pronunciation and Accent Conversion through Knowledge Distillation And Synthetic Ground-Truth from Native TTS

Nguyen, Tuan Nam, Akti, Seymanur, Pham, Ngoc Quan, Waibel, Alexander

Oct-19-2024–arXiv.org Artificial Intelligence

Previous approaches on accent conversion (AC) mainly aimed at making non-native speech sound more native while maintaining the original content and speaker identity. However, non-native speakers sometimes have pronunciation issues, which can make it difficult for listeners to understand them. Hence, we developed a new AC approach that not only focuses on accent conversion but also improves pronunciation of non-native accented speaker. By providing the non-native audio and the corresponding transcript, we generate the ideal ground-truth audio with native-like pronunciation with original duration and prosody. This ground-truth data aids the model in learning a direct mapping between accented and native speech. We utilize the end-to-end VITS framework to achieve high-quality waveform reconstruction for the AC task. As a result, our system not only produces audio that closely resembles native accents and while retaining the original speaker's identity but also improve pronunciation, as demonstrated by evaluation results.

artificial intelligence, encoder, machine learning, (16 more...)

arXiv.org Artificial Intelligence

Oct-19-2024

arXiv.org PDF

Add feedback

Country:
- North America > Canada
  - Quebec > Montreal (0.04)
- Europe
  - Germany > Baden-Württemberg
    - Karlsruhe Region > Karlsruhe (0.04)
  - France > Hauts-de-France
    - Nord > Lille (0.04)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Speech > Speech Recognition (0.68)
  - Machine Learning > Neural Networks (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found