Unsupervised Word Discovery: Boundary Detection with Clustering vs. Dynamic Programming
Malan, Simon, van Niekerk, Benjamin, Kamper, Herman
–arXiv.org Artificial Intelligence
We look at the long-standing problem of segmenting unlabeled speech into word-like segments and clustering these into a lexicon. Several previous methods use a scoring model coupled with dynamic programming to find an optimal segmentation. Here we propose a much simpler strategy: we predict word boundaries using the dissimilarity between adjacent self-supervised features, then we cluster the predicted segments to construct a lexicon. For a fair comparison, we update the older ES-KMeans dynamic programming method with better features and boundary constraints. On the five-language ZeroSpeech benchmarks, our simple approach gives similar state-of-the-art results compared to the new ES-KMeans+ method, while being almost five times faster.
arXiv.org Artificial Intelligence
Sep-22-2024
- Country:
- Africa (0.14)
- Genre:
- Research Report (1.00)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning (1.00)
- Natural Language (0.72)
- Representation & Reasoning > Optimization (0.82)
- Speech (0.69)
- Information Technology > Artificial Intelligence