Learning Representations of Emotional Speech with Deep Convolutional Generative Adversarial Networks

Chang, Jonathan, Scherer, Stefan

arXiv.org Machine Learning 

ABSTRACT Automatically assessing emotional valence in human speech has historically been a difficult task for machine learning algorithms. The subtle changes in the voice of the speaker that are indicative of positive or negative emotional states are often "overshadowed" by voice characteristics relating to emotional intensity or emotional activation. In this work we explore a representation learning approach that automatically derives discriminative representations of emotional speech. In particular, we investigate two machine learning strategies to improve classifier performance: (1) utilization of unlabeled data using a deep convolutional generative adversarial network (DCGAN), and (2) multitask learning. Our speakerindependent classification experiments show that in particular the use of unlabeled data in our investigations improves performance of the classifiers and both fully supervised baseline approaches are outperformed considerably. We improve the classification of emotional valence on a discrete 5-point scale to 43.88% and on a 3-point scale to 49.80%, which is competitive to state-of-the-art performance. Index Terms-- Machine Learning, Affective Computing, Semisupervised Learning, Deep Learning 1. INTRODUCTION Machine Learning, in general, and affective computing, in particular, rely on good data representations or features that have a good discriminatory faculty in classification and regression experiments, such as emotion recognition from speech.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found