A Cascaded Architecture for Extractive Summarization of Multimedia Content via Audio-to-Text Alignment

Open in new window