Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model

Open in new window