Audio-Guided Fusion Techniques for Multimodal Emotion Analysis

Sep-8-2024–arXiv.org Artificial Intelligence

In this paper, we propose a solution for the semi-supervised learning track (MER-SEMI) in MER2024. First, in order to enhance the performance of the feature extractor on sentiment classification tasks,we fine-tuned video and text feature extractors, specifically CLIP-vit-large and Baichuan-13B, using labeled data. This approach effectively preserves the original emotional information conveyed in the videos. Second, we propose an Audio-Guided Transformer (AGT) fusion mechanism, which leverages the robustness of Hubert-large, showing superior effectiveness in fusing both inter-channel and intra-channel information. Third, To enhance the accuracy of the model, we iteratively apply self-supervised learning by using high-confidence unlabeled data as pseudo-labels. Finally, through black-box probing, we discovered an imbalanced data distribution between the training and test sets. Therefore, We adopt a prior-knowledge-based voting mechanism. The results demonstrate the effectiveness of our strategy, ultimately earning us third place in the MER-SEMI track.

information, mechanism, modality, (13 more...)

arXiv.org Artificial Intelligence

Sep-8-2024

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - Victoria > Melbourne (0.05)
- North America > United States
  - New York > New York County > New York City (0.04)
- Asia > China
  - Beijing > Beijing (0.05)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Information Extraction (0.86)
  - Representation & Reasoning > Information Fusion (0.65)
  - Machine Learning
    - Inductive Learning (0.90)
    - Neural Networks > Deep Learning (0.69)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found