Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning