ArabEmoNet: A Lightweight Hybrid 2D CNN-BiLSTM Model with Attention for Robust Arabic Speech Emotion Recognition
Abouzeid, Ali, Elbouardi, Bilal, Maged, Mohamed, Shehata, Shady
–arXiv.org Artificial Intelligence
Speech emotion recognition is vital for human-computer interaction, particularly for low-resource languages like Arabic, which face challenges due to limited data and research. We introduce ArabEmoNet, a lightweight architecture designed to overcome these limitations and deliver state-of-the-art performance. Unlike previous systems relying on discrete MFCC features and 1D convolutions, which miss nuanced spectro-temporal patterns, ArabEmoNet uses Mel spectrograms processed through 2D convolutions, preserving critical emotional cues often lost in traditional methods. While recent models favor large-scale architectures with millions of parameters, ArabEmoNet achieves superior results with just 1 million parameters, 90 times smaller than HuBERT base and 74 times smaller than Whisper. This efficiency makes it ideal for resource-constrained environments. ArabEmoNet advances Arabic speech emotion recognition, offering exceptional performance and accessibility for real-world applications.
arXiv.org Artificial Intelligence
Sep-3-2025
- Country:
- Africa > Middle East
- Algeria > Biskra Province > Biskra (0.04)
- Asia > Middle East
- Saudi Arabia (0.04)
- Syria (0.04)
- Africa > Middle East
- Genre:
- Research Report (0.64)
- Technology: