Interpretable and Globally Optimal Prediction for Textual Grounding using Image Concepts

Open in new window