Summarization of Multimodal Presentations with Vision-Language Models: Study of the Effect of Modalities and Structure

Open in new window