Learning the joint distribution of two sequences using little or no paired data
Mariooryad, Soroosh, Shannon, Matt, Ma, Siyuan, Bagby, Tom, Kao, David, Stanton, Daisy, Battenberg, Eric, Skerry-Ryan, RJ
–arXiv.org Artificial Intelligence
A classical ASR approach treats the process of generating speech as a noisy channel. In this framing, text is drawn from some distribution and statistically transformed into We present a noisy channel generative model speech audio; the speech recognition task is then to invert of two sequences, for example text and speech, this generative model to infer the text most likely to have which enables uncovering the association between given rise to a given speech waveform. This generative the two modalities when limited paired data is model of speech was historically successful (Baker, 1975; available. To address the intractability of the exact Jelinek, 1976; Rabiner, 1989), but has been superseded in model under a realistic data setup, we propose modern discriminative systems by directly modeling the a variational inference approximation. To train conditional distribution of text, given speech (Graves et al., this variational model with categorical data, we 2006; Amodei et al., 2016). The direct approach has the advantage propose a KL encoder loss approach which has of allowing limited modeling power to be solely devoted connections to the wake-sleep algorithm. Identifying to the task of interest, whereas the generative one can the joint or conditional distributions by only be extremely sensitive to faulty assumptions in the speech observing unpaired samples from the marginals is audio model despite the fact that this is not the primary only possible under certain conditions in the data object of interest. However the generative approach allows distribution and we discuss under what type of learning in a principled way from untranscribed speech conditional independence assumptions that might audio, something fundamentally impossible in the direct approach.
arXiv.org Artificial Intelligence
Dec-6-2022
- Country:
- North America > United States (0.28)
- Genre:
- Instructional Material (0.46)
- Research Report (0.64)
- Industry:
- Health & Medicine (0.46)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning
- Learning Graphical Models (1.00)
- Neural Networks > Deep Learning (0.89)
- Statistical Learning (1.00)
- Natural Language (1.00)
- Representation & Reasoning (1.00)
- Speech (1.00)
- Machine Learning
- Information Technology > Artificial Intelligence