Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations

Open in new window