Improving Multimodal fusion via Mutual Dependency Maximisation

Colombo, Pierre, Chapuis, Emile, Labeau, Matthieu, Clavel, Chloe

Sep-9-2021–arXiv.org Artificial Intelligence

Multimodal sentiment analysis is a trending area of research, and the multimodal fusion is one of its most active topic. Acknowledging humans communicate through a variety of channels (i.e visual, acoustic, linguistic), multimodal systems aim at integrating different unimodal representations into a synthetic one. So far, a consequent effort has been made on developing complex architectures allowing the fusion of these modalities. However, such systems are mainly trained by minimising simple losses such as $L_1$ or cross-entropy. In this work, we investigate unexplored penalties and propose a set of new objectives that measure the dependency between modalities. We demonstrate that our new penalties lead to a consistent improvement (up to $4.3$ on accuracy) across a large variety of state-of-the-art models on two well-known sentiment analysis datasets: \texttt{CMU-MOSI} and \texttt{CMU-MOSEI}. Our method not only achieves a new SOTA on both datasets but also produces representations that are more robust to modality drops. Finally, a by-product of our methods includes a statistical network which can be used to interpret the high dimensional representations learnt by the model.

deep learning, neural network, representation, (20 more...)

arXiv.org Artificial Intelligence

Sep-9-2021

arXiv.org PDF

Add feedback

Country:
- Europe (0.28)
- North America > United States
  - Minnesota > Hennepin County > Minneapolis (0.14)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language
    - Discourse & Dialogue (0.69)
    - Information Extraction (0.69)