Self-Supervised Multimodal Opinion Summarization