Unsupervised ASR via Cross-Lingual Pseudo-Labeling

Likhomanenko, Tatiana, Lugosch, Loren, Collobert, Ronan

Sep-29-2023–arXiv.org Artificial Intelligence

Recent work has shown that it is possible to train an unsupervised automatic speech recognition (ASR) system using only unpaired audio and text. Existing unsupervised ASR methods assume that no labeled data can be used for training. We argue that even if one does not have any labeled audio for a given language, there is always labeled data available for other languages. We show that it is possible to use character-level acoustic models (AMs) from other languages to bootstrap an unsupervised AM in a new language. Here, "unsupervised" means no labeled audio is available for the target language. Our approach is based on two key ingredients: (i) generating pseudo-labels (PLs) of the target language using some other language AM and (ii) constraining these PLs with a target language model. Our approach is effective on Common Voice: e.g. It also outperforms character-based wav2vec-U 2.0 by 15% absolute WER on LJSpeech with 800h of labeled German data instead of 60k hours of unlabeled English data. Spanish, es) and generating pseudo-labels using a language model for the desired target language (e.g. English), we can train an unsupervised speech recognition system for the target language using iterative pseudo-labeling.

source language, speech recognition, target language, (15 more...)

arXiv.org Artificial Intelligence

Sep-29-2023

arXiv.org PDF

Add feedback

Country:
- Asia > India (0.04)
- South America
  - Peru (0.04)
  - Chile > Santiago Metropolitan Region
    - Santiago Province > Santiago (0.04)
- North America > United States
  - Texas > Dallas County > Dallas (0.04)
- Europe
  - Italy > Apulia (0.04)
  - United Kingdom > England
    - Oxfordshire > Oxford (0.04)
  - France > Provence-Alpes-Côte d'Azur
    - Bouches-du-Rhône > Marseille (0.04)
- Africa
  - Niger (0.04)
  - Mali (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Speech > Speech Recognition (1.00)
  - Natural Language (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found