Goto

Collaborating Authors

 Machine Translation



u-HuBERT: UnifiedMixed-ModalSpeechPretraining AndZero-ShotTransfertoUnlabeledModality

Neural Information Processing Systems

Byutilizingmodality dropout during pre-training, we demonstrate that a single fine-tuned model can achieve performance on par or better than the state-of-the-art modality-specific models.








MultimodalandMultilingualEmbeddings forLarge-ScaleSpeechMining

Neural Information Processing Systems

Using a similarity metric in that multimodal embedding space, we perform mining of audio in German, French, Spanish and English from Librivox against billions of sentences from CommonCrawl.