Exploring Efficient Foundational Multi-modal Models for Video Summarization

Open in new window