StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification

Open in new window