Towards Matching Phones and Speech Representations
–arXiv.org Artificial Intelligence
Learning phone types from phone instances has been a long-standing problem, while still being open. In this work, we revisit this problem in the context of self-supervised learning, and pose it as the problem of matching cluster centroids to phone embeddings. We study two key properties that enable matching, namely, whether cluster centroids of self-supervised representations reduce the variability of phone instances and respect the relationship among phones. We then use the matching result to produce pseudo-labels and introduce a new loss function for improving self-supervised representations. Our experiments show that the matching result captures the relationship among phones. Training the new loss function jointly with the regular self-supervised losses, such as APC and CPC, significantly improves the downstream phone classification.
arXiv.org Artificial Intelligence
Oct-26-2023
- Country:
- North America > United States
- Indiana > Lake County > Hammond (0.04)
- South America > Chile
- North America > United States
- Genre:
- Research Report (0.64)
- Technology: