Samba-ASR: State-Of-The-Art Speech Recognition Leveraging Structured State-Space Models

Shakhadri, Syed Abdul Gaffar, KR, Kruthika, Angadi, Kartik Basavaraj

arXiv.org Artificial Intelligence 

The rapid evolution of deep learning has significantly transformed Automatic Speech Recognition (ASR), shifting from traditional systems such as Hidden Markov Models (HMMs) and Gaussian Mixture Models (GMMs) to advanced end-to-end neural architectures. While innovations such as Connectionist Temporal Classification (CTC) and attentionbased encoder-decoder models have established new baselines [1], transformer-based models like OpenAI's Whisper have further pushed the boundaries, setting state-of-the-art benchmarks for multilingual, multitask ASR systems [2]. Despite their successes, transformer architectures face inherent challenges in scaling to long sequences, particularly those encountered in extended audio recordings.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found