Unsupervised Cross-Modal Alignment of Speech and Text Embedding Spaces

Yu-An Chung, Wei-Hung Weng, Schrasing Tong, James Glass

Neural Information Processing Systems 

Recently, there is an increasing interest in learning the semantics of a language directly, and only from rawspeech [24,27,28].

Similar Docs  Excel Report  more

TitleSimilaritySource
None found