Learning the joint distribution of two sequences using little or no paired data

Mariooryad, Soroosh, Shannon, Matt, Ma, Siyuan, Bagby, Tom, Kao, David, Stanton, Daisy, Battenberg, Eric, Skerry-Ryan, RJ

Dec-6-2022–arXiv.org Artificial Intelligence

A classical ASR approach treats the process of generating speech as a noisy channel. In this framing, text is drawn from some distribution and statistically transformed into We present a noisy channel generative model speech audio; the speech recognition task is then to invert of two sequences, for example text and speech, this generative model to infer the text most likely to have which enables uncovering the association between given rise to a given speech waveform. This generative the two modalities when limited paired data is model of speech was historically successful (Baker, 1975; available. To address the intractability of the exact Jelinek, 1976; Rabiner, 1989), but has been superseded in model under a realistic data setup, we propose modern discriminative systems by directly modeling the a variational inference approximation. To train conditional distribution of text, given speech (Graves et al., this variational model with categorical data, we 2006; Amodei et al., 2016). The direct approach has the advantage propose a KL encoder loss approach which has of allowing limited modeling power to be solely devoted connections to the wake-sleep algorithm. Identifying to the task of interest, whereas the generative one can the joint or conditional distributions by only be extremely sensitive to faulty assumptions in the speech observing unpaired samples from the marginals is audio model despite the fact that this is not the primary only possible under certain conditions in the data object of interest. However the generative approach allows distribution and we discuss under what type of learning in a principled way from untranscribed speech conditional independence assumptions that might audio, something fundamentally impossible in the direct approach.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Dec-6-2022

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - California > Santa Clara County > Mountain View (0.04)
- Europe > Italy
  - Calabria > Catanzaro Province > Catanzaro (0.04)

Genre:
- Research Report (0.64)
- Instructional Material (0.46)

Industry:
- Health & Medicine (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Speech (1.00)
  - Representation & Reasoning (1.00)
  - Natural Language (1.00)
  - Machine Learning
    - Statistical Learning (1.00)
    - Learning Graphical Models (1.00)
    - Neural Networks > Deep Learning (0.89)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found