GPT2MVS: Generative Pre-trained Transformer-2 for Multi-modal Video Summarization