Word Discovery in Visually Grounded, Self-Supervised Speech Models

Peng, Puyuan, Harwath, David

arXiv.org Artificial Intelligence 

Our powerful word segmentation and clustering capability emerges method is simple: it simply involves applying a binary threshold within the model's self-attention heads. Our experiments reveal to the self-attention maps produced by the model, and extracting that this ability is not present to nearly the same extent in contiguous temporal regions of the speech signal with the base HuBERT and wav2vec2.0

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found