Word to Sentence Visual Semantic Similarity for Caption Generation: Lessons Learned