A Modular Approach for Multimodal Summarization of TV Shows