Multi-Modality Multi-Loss Fusion Network

Wu, Zehui, Gong, Ziwei, Koo, Jaywon, Hirschberg, Julia

Sep-11-2023–arXiv.org Artificial Intelligence

We compare different methods for extracting The multimodal affective computing field has seen audio features as well as different fusion significant advances in feature extraction and multimodal network methods to combine audio and text signals fusion methodologies in recent years. By to identify the best-performing procedures. We find combining audio, text and visual signals, these that the addition of audio signals consistently improves models offer a more comprehensive, nuanced understanding performance and also that our transformer of human emotions. However, there fusion network further enhances results for most are still limitations: hand-crafted feature extraction metrics and achieves state-of-the-art results across algorithms often lack flexibility and generalization all datasets, indicating its efficacy in enhancing across diverse tasks. To overcome these limitations, cross-modality modeling and its potential for multimodal recent studies have proposed fully end-to-end models emotion detection. From multi-loss training, that optimize both feature extraction and learning we also observe that 1) using distinct labels for processes jointly (Dai et al., 2021). Our work each modality in multi-loss training significantly extracts feature representations from pre-trained benefits the models' performance, and 2) training models for different modalities and combines them on multimodal features improves not only the overall in an end-to-end manner, which provides a comprehensive model performance but also the model's accuracy and adaptable solution for multimodal on the single-modality subnet.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Sep-11-2023

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East
  - UAE (0.14)
- North America > United States (0.68)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Government > Regional Government (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Cognitive Science > Emotion (0.69)
  - Machine Learning > Neural Networks (0.94)
  - Natural Language (1.00)
  - Representation & Reasoning (0.66)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found