Temporal aggregation of audio-visual modalities for emotion recognition

Open in new window