MMER: Multimodal Multi-task Learning for Speech Emotion Recognition