Word Discovery in Visually Grounded, Self-Supervised Speech Models
–arXiv.org Artificial Intelligence
Our powerful word segmentation and clustering capability emerges method is simple: it simply involves applying a binary threshold within the model's self-attention heads. Our experiments reveal to the self-attention maps produced by the model, and extracting that this ability is not present to nearly the same extent in contiguous temporal regions of the speech signal with the base HuBERT and wav2vec2.0
arXiv.org Artificial Intelligence
Jun-19-2023
- Country:
- North America > United States > Texas > Travis County > Austin (0.04)
- Genre:
- Research Report (1.00)
- Technology: