MAVEN: Multi-modal Attention for Valence-Arousal Emotion Network