On the Effectiveness of ASR Representations in Real-world Noisy Speech Emotion Recognition

Shi, Xiaohan, He, Jiajun, Li, Xingfeng, Toda, Tomoki

Nov-14-2023–arXiv.org Artificial Intelligence

Typically, three common approaches are used to address the issue of noisy This paper proposes an efficient attempt to noisy speech emotion speech emotion recognition (NSER): the signal level, the feature recognition (NSER). Conventional NSER approaches level, and the model level, as outlined by Tiwari et al have proven effective in mitigating the impact of artificial [2]. For instance, Pandharipande et al. [3] used a voice activity noise sources, such as white Gaussian noise, but are limited detector to reduce noise at the signal level. Lachiri et to non-stationary noises in real-world environments due to al. [4] introduced a novel approach involving MFCC-shifteddelta-cepstral their complexity and uncertainty. To overcome this limitation, coefficients at the feature level. Tiwari et al. [2] we introduce a new method for NSER by adopting the devised a generative noise model at the model level. The previously automatic speech recognition (ASR) model as a noise-robust mentioned studies have proven effective in mitigating feature extractor to eliminate non-vocal information in noisy the impact of common noise sources like white Gaussian speech. We first obtain intermediate layer information from noise on speech-related tasks. However, in real-world settings, the ASR model as a feature representation for emotional a distinct category of noise sounds, such as high-heeled speech and then apply this representation for the downstream shoes and door knocking, presents a substantial challenge.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Nov-14-2023

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.70)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language (0.95)
  - Speech > Speech Recognition (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found