Realizing Video Summarization from the Path of Language-based Semantic Understanding