REDAT: Accent-Invariant Representation for End-to-End ASR by Domain Adversarial Training with Relabeling

Hu, Hu, Yang, Xuesong, Raeesy, Zeynab, Guo, Jinxi, Keskin, Gokce, Arsikere, Harish, Rastrow, Ariya, Stolcke, Andreas, Maas, Roland

Dec-14-2020–arXiv.org Artificial Intelligence

Accents mismatching is a critical problem for end-to-end ASR. This paper aims to address this problem by building an accent-robust RNN-T system with domain adversarial training (DAT). We unveil the magic behind DAT and provide, for the first time, a theoretical guarantee that DAT learns accent-invariant representations. We also prove that performing the gradient reversal in DAT is equivalent to minimizing the Jensen-Shannon divergence between domain output distributions. Motivated by the proof of equivalence, we introduce reDAT, a novel technique based on DAT, which relabels data using either unsupervised clustering or soft labels. Experiments on 23K hours of multi-accent data show that DAT achieves competitive results over accent-specific baselines on both native and non-native English accents but up to 13% relative WER reduction on unseen accents; our reDAT yields further improvements over DAT by 3% and 8% relatively on non-native accents of American and British English.

deep learning, neural network, soft label, (18 more...)

arXiv.org Artificial Intelligence

Dec-14-2020

arXiv.org PDF

Add feedback

Genre:
- Research Report > Promising Solution (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Statistical Learning
    - Clustering (0.35)
  - Speech (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found