Understanding Shared Speech-Text Representations

Wang, Gary, Kastner, Kyle, Bapna, Ankur, Chen, Zhehuai, Rosenberg, Andrew, Ramabhadran, Bhuvana, Zhang, Yu

Apr-27-2023–arXiv.org Artificial Intelligence

In this work, we expand on this understanding in two directions. Recently, a number of approaches to train speech models by incorporating First, we evaluate the ability to transfer information from one domain text into end-to-end models have been developed, with Maestro to another through the joint representation (Section 4). We explore advancing state-of-the-art automatic speech recognition (ASR) which components of the text encoder are robust across corpora, and and Speech Translation (ST) performance. In this paper, we expand which are sensitive. Second, we investigate the modal representations our understanding of the resulting shared speech-text representations from the speech and text encoders (Section 5). We inspect the with two types of analyses. First we examine the limits of speechfree cross-modal consistency loss as a signal of robustness, and the ability domain adaptation, finding that a corpus-specific duration model for this loss term to generalize across corpora through T-SNE for speech-text alignment is the most important component for learning visualization of activations and a retrieval probe task.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Apr-27-2023

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language (1.00)
  - Speech > Speech Recognition (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found