An Experimental Study on Generating Plausible Textual Explanations for Video Summarization