Emotion Detection in Speech Using Lightweight and Transformer-Based Models: A Comparative and Ablation Study