Transformer-based encoder-encoder architecture for Spoken Term Detection

Nov-2-2022–arXiv.org Artificial Intelligence

The paper presents a method for spoken term detection based on In this work, we do not focus on the direct processing of the the Transformer architecture. We propose the encoder encoder input speech signal. Instead, we use the speech recognizer to convert architecture employing two BERT-like encoders with additional an audio signal into a graphemic recognition hypothesis. The modifications, including convolutional and upsampling layers, attention representation of speech at the grapheme level allows preprocessing masking, and shared parameters. The encoders project a the input audio into a compact confusion network and further to a recognized hypothesis and a searched term into a shared embedding sequence of embedding vectors. In [7], we proposed a Deep LSTM space, where the score of the putative hit is computed using the calibrated architecture for spoken term detection, which uses the projection dot product. In the experiments, we used the Wav2Vec 2.0 of both the input speech and searched term into a shared embedding speech recognizer, and the proposed system outperformed a baseline space. The hybrid DNN-HMM speech recognizer produced method based on deep LSTMs on the English and Czech STD phoneme confusion networks representing the input speech. The datasets based on USC Shoah Foundation Visual History Archive DNN-HMM speech recognizer can be replaced with the Wav2Vec (MALACH).

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Nov-2-2022

arXiv.org PDF

Add feedback

Country:
- Europe > Czechia (0.04)
- North America > United States
  - Minnesota > Hennepin County
    - Minneapolis (0.14)
  - Washington > King County
    - Seattle (0.04)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language (1.00)
  - Speech (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found