A Joint Speaker-Listener-Reinforcer Model for Referring Expressions
Yu, Licheng, Tan, Hao, Bansal, Mohit, Berg, Tamara L.
–arXiv.org Artificial Intelligence
Referring expressions are natural language constructions used to identify particular objects within a scene. In this paper, we propose a unified framework for the tasks of referring expression comprehension and generation. Our model is composed of three modules: speaker, listener, and reinforcer . The speaker generates referring expressions, the listener comprehends referring expressions, and the reinforcer introduces a reward function to guide sampling of more discriminative expressions. The listener-speaker modules are trained jointly in an end-to-end learning framework, allowing the modules to be aware of one another during learning while also benefiting from the discriminative reinforcer's feedback. W e demonstrate that this unified framework and training achieves state-of-the-art results for both comprehension and generation on three referring expression datasets. Project and demo page: https://vision.cs.unc.edu/refer.
arXiv.org Artificial Intelligence
Apr-17-2017