Visual Reference Resolution using Attention Memory for Visual Dialog
Seo, Paul Hongsuck, Lehrmann, Andreas, Han, Bohyung, Sigal, Leonid
–Neural Information Processing Systems
Visual dialog is a task of answering a series of inter-dependent questions given an input image, and often requires to resolve visual references among the questions. This problem is different from visual question answering (VQA), which relies on spatial attention ({\em a.k.a. We propose a novel attention mechanism that exploits visual attentions in the past to resolve the current reference in the visual dialog scenario. The proposed model is equipped with an associative attention memory storing a sequence of previous (attention, key) pairs. From this memory, the model retrieves previous attention, taking into account recency, that is most relevant for the current question, in order to resolve potentially ambiguous reference(s).
Neural Information Processing Systems
Feb-14-2020, 13:43:25 GMT
- Technology: