Unsupervised Word Discovery: Boundary Detection with Clustering vs. Dynamic Programming

Malan, Simon, van Niekerk, Benjamin, Kamper, Herman

Sep-22-2024–arXiv.org Artificial Intelligence

We look at the long-standing problem of segmenting unlabeled speech into word-like segments and clustering these into a lexicon. Several previous methods use a scoring model coupled with dynamic programming to find an optimal segmentation. Here we propose a much simpler strategy: we predict word boundaries using the dissimilarity between adjacent self-supervised features, then we cluster the predicted segments to construct a lexicon. For a fair comparison, we update the older ES-KMeans dynamic programming method with better features and boundary constraints. On the five-language ZeroSpeech benchmarks, our simple approach gives similar state-of-the-art results compared to the new ES-KMeans+ method, while being almost five times faster.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

Sep-22-2024

arXiv.org PDF

Add feedback

Country:
- Africa (0.14)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language (0.72)
  - Representation & Reasoning > Optimization (0.82)
  - Speech (0.69)