LearningDistinctandRepresentativeModes forImageCaptioning