Word Discovery in Visually Grounded, Self-Supervised Speech Models

Jun-19-2023–arXiv.org Artificial Intelligence

Our powerful word segmentation and clustering capability emerges method is simple: it simply involves applying a binary threshold within the model's self-attention heads. Our experiments reveal to the self-attention maps produced by the model, and extracting that this ability is not present to nearly the same extent in contiguous temporal regions of the speech signal with the base HuBERT and wav2vec2.0

attention segment, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Jun-19-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States > Texas > Travis County > Austin (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (0.93)
  - Artificial Intelligence
    - Natural Language (1.00)
    - Machine Learning > Statistical Learning (0.69)
    - Speech > Speech Recognition (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found