Modeling Paragraph-Level Vision-Language Semantic Alignment for Multi-Modal Summarization

Open in new window