Variational Fusion for Multimodal Sentiment Analysis
Majumder, Navonil, Poria, Soujanya, Krishnamurthy, Gangeshwar, Chhaya, Niyati, Mihalcea, Rada, Gelbukh, Alexander
–arXiv.org Artificial Intelligence
This is important, as more and more enterprises tend to make business decisions based on the user sentiment behind their products as expressed through these videos. Multimodal fusion is considered a key step in multimodal sentiment analysis. Most recent work on multimodal fusion (Poria et al., 2017; Zadeh et al., 2018c) has focused on the strategy of obtaining a multimodal representation from the independent unimodal representations. Our approach takes this strategy one step further, by also requiring that the original unimodal representations be reconstructed from the unified multimodal representation. The motivation behind this is the intuition that different modalities are an expression of the state of the mind. Hence, if we assume that the fused representation is the mind-state/sentiment/emotion, then in our approach we are ensuring that the fused representation can be mapped back to the unimodal representations, which should improve the quality of the multi-modal representation. In this paper, we empirically argue that this is the case by showing that this approach outperforms the state-of-the-art in mul-timodal fusion. We employ a variational autoencoder (V AE) (Kingma and Welling, 2014), where the encoder network generates a latent representation from the unimodal representations.
arXiv.org Artificial Intelligence
Aug-13-2019
- Country:
- Asia (0.29)
- Europe > Denmark (0.14)
- North America > United States (0.14)
- Oceania > Australia (0.14)
- Genre:
- Research Report (1.00)
- Technology: