An Exploration of Mamba for Speech Self-Supervised Models

Lin, Tzu-Quan, Kuo, Heng-Cheng, Wei, Tzu-Chieh, Cheng, Hsi-Chun, Chen, Chun-Wei, Hsiao, Hsien-Fu, Tsao, Yu, Lee, Hung-yi

Jun-17-2025–arXiv.org Artificial Intelligence

--While Mamba has demonstrated strong performance in language modeling, its potential as a speech self-supervised (SSL) model remains underexplored, with prior studies limited to isolated tasks. T o address this, we explore Mamba-based HuBERT models as alternatives to Transformer-based SSL architectures. Leveraging the linear-time Selective State Space, these models enable fine-tuning on long-context ASR with significantly lower compute. Moreover, they show superior performance when fine-tuned for streaming ASR. Beyond fine-tuning, these models show competitive performance on SUPERB probing benchmarks, particularly in causal settings. Our analysis shows that they yield higher-quality quantized representations and capture speaker-related features more distinctly than Transformer-based models. In recent years, Transformer-based models and their multi-head self-attention mechanisms have achieved remarkable success across various domains [1]-[3].

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Jun-17-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Speech > Speech Recognition (0.95)
  - Machine Learning > Neural Networks (0.91)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found