Video Summarization: Towards Entity-Aware Captions