Self-Supervised Multimodal Opinion Summarization

Im, Jinbae, Kim, Moonki, Lee, Hoyeop, Cho, Hyunsouk, Chung, Sehee

May-27-2021–arXiv.org Artificial Intelligence

Recently, opinion summarization, which is the generation of a summary from multiple reviews, has been conducted in a self-supervised manner by considering a sampled review as a pseudo summary. However, non-text data such as image and metadata related to reviews have been considered less often. To use the abundant information contained in non-text data, we propose a self-supervised multimodal opinion summarization framework called MultimodalSum. Our framework obtains a representation of each modality using a separate encoder for each modality, and the text decoder generates a summary. To resolve the inherent heterogeneity of multimodal data, we propose a multimodal training pipeline. We first pretrain the text encoder--decoder based solely on text modality data. Subsequently, we pretrain the non-text modality encoders by considering the pretrained text decoder as a pivot for the homogeneous representation of multimodal data. Finally, to fuse multimodal representations, we train the entire framework in an end-to-end manner. We demonstrate the superiority of MultimodalSum by conducting experiments on Yelp and Amazon datasets.

artificial intelligence, neural network, summarization, (18 more...)

arXiv.org Artificial Intelligence

May-27-2021

arXiv.org PDF

Add feedback

Country:
- Asia (0.28)

Genre:
- Research Report > Experimental Study (0.46)

Industry:
- Consumer Products & Services (0.67)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Neural Networks (1.00)
    - Natural Language (1.00)
  - Sensing and Signal Processing > Image Processing (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found