An Improved Single Step Non-autoregressive Transformer for Automatic Speech Recognition

Fan, Ruchao, Chu, Wei, Chang, Peng, Xiao, Jing, Alwan, Abeer

arXiv.org Artificial Intelligence 

In addition, Fujita et al. used the idea of the insertion Non-autoregressive mechanisms can significantly decrease inference transformer from NMT to generate the output sequence time for speech transformers, especially when the single with an arbitrary order [12]. Another recent effective method step variant is applied. Previous work on CTC alignmentbased is using multiple decoders as refiners to do an iterative refinement single step non-autoregressive transformer (CASS-NAT) based on CTC alignments [14]. Theoretically, the iterative has shown a large real time factor (RTF) improvement over autoregressive NAT has a limited improvement of inference speed since multiple transformers (AT). In this work, we propose several iterations are still needed to obtain a competitive result. In methods to improve the accuracy of the end-to-end CASS-contrast, single step NAT, which attempts to generate the output NAT, followed by performance analyses. First, convolution sequence with only one iteration, can have a better speed up augmented self-attention blocks are applied to both the encoder for inference. The idea is to substitute the word embedding in and decoder modules. Second, we propose to expand the trigger autoregressive models with an acoustic representation for each mask (acoustic boundary) for each token to increase the robustness output token, assuming that language semantics can also be captured of CTC alignments.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found