Versatile Audio-Visual Learning for Handling Single and Multi Modalities in Emotion Regression and Classification Tasks