Interpretable Visual Question Answering by Visual Grounding from Attention Supervision Mining

Open in new window