Unsupervised Personalization of an Emotion Recognition System: The Unique Properties of the Externalization of Valence in Speech

Jan-19-2022–arXiv.org Artificial Intelligence

Abstract--The prediction of valence from speech is an important, but challenging problem. The externalization of valence in speech has speaker-dependent cues, which contribute to performances that are often significantly lower than the prediction of other emotional attributes such as arousal and dominance. A practical approach to improve valence prediction from speech is to adapt the models to the target speakers in the test set. Adapting a speech emotion recognition (SER) system to a particular speaker is a hard problem, especially with deep neural networks (DNNs), since it requires optimizing millions of parameters. This study proposes an unsupervised approach to address this problem by searching for speakers in the train set with similar acoustic patterns as the speaker in the test set. Speech samples from the selected speakers are used to create the adaptation set. This approach leverages transfer learning using pre-trained models, which are adapted with these speech samples. We propose three alternative adaptation strategies: unique speaker, oversampling and weighting approaches. These methods differ on the use of the adaptation set in the personalization of the valence models. The results demonstrate that a valence prediction model can be efficiently personalized with these unsupervised approaches, leading to relative improvements as high as 13.52%. Index Terms--Speech emotion recognition, adaptation, transfer learning, emotional dimensions, valence. In potential in fields such as human-computer interactions particular, the emotional attribute valence is key (HCIs), healthcare [1], [2] and behavioral studies to understand many behavioral disorders [6], [7] [3], [4]. Although different is still a challenging task. The usual formulation approaches have been proposed to improve SER to describe emotions is with categorical descriptors systems, the prediction of valence using acoustic such as happiness, sadness, anger and neutral.

artificial intelligence, machine learning, valence, (19 more...)

arXiv.org Artificial Intelligence

Jan-19-2022

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - Queensland > Brisbane (0.04)
- North America
  - United States
    - Texas (0.04)
    - Washington > King County
      - Bellevue (0.04)
    - Oregon > Multnomah County
      - Portland (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - Arizona > Maricopa County
      - Scottsdale (0.04)
  - Canada > Quebec
    - Montreal (0.04)
- Europe
  - Germany (0.04)
  - United Kingdom > England
    - East Sussex > Brighton (0.14)
  - Sweden > Stockholm
    - Stockholm (0.04)
  - Netherlands > Gelderland
    - Nijmegen (0.04)
  - France > Auvergne-Rhône-Alpes
    - Lyon > Lyon (0.04)
  - Austria
    - Vienna (0.14)
    - Styria > Graz (0.04)
- Asia
  - Singapore (0.04)
  - Japan > Honshū
    - Kansai > Kyoto Prefecture > Kyoto (0.04)
  - India
    - Telangana > Hyderabad (0.04)
    - NCT
      - New Delhi (0.04)
      - Delhi (0.04)
  - China > Shanghai
    - Shanghai (0.04)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (0.93)

Industry:
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Cognitive Science > Emotion (1.00)
  - Machine Learning
    - Statistical Learning (1.00)
    - Neural Networks > Deep Learning (0.88)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found