Goto

Collaborating Authors

 samba-asr


Samba-ASR: State-Of-The-Art Speech Recognition Leveraging Structured State-Space Models

arXiv.org Artificial Intelligence

The rapid evolution of deep learning has significantly transformed Automatic Speech Recognition (ASR), shifting from traditional systems such as Hidden Markov Models (HMMs) and Gaussian Mixture Models (GMMs) to advanced end-to-end neural architectures. While innovations such as Connectionist Temporal Classification (CTC) and attentionbased encoder-decoder models have established new baselines [1], transformer-based models like OpenAI's Whisper have further pushed the boundaries, setting state-of-the-art benchmarks for multilingual, multitask ASR systems [2]. Despite their successes, transformer architectures face inherent challenges in scaling to long sequences, particularly those encountered in extended audio recordings.