A Joint Speaker-Listener-Reinforcer Model for Referring Expressions

Yu, Licheng, Tan, Hao, Bansal, Mohit, Berg, Tamara L.

arXiv.org Artificial Intelligence 

Referring expressions are natural language constructions used to identify particular objects within a scene. In this paper, we propose a unified framework for the tasks of referring expression comprehension and generation. Our model is composed of three modules: speaker, listener, and reinforcer . The speaker generates referring expressions, the listener comprehends referring expressions, and the reinforcer introduces a reward function to guide sampling of more discriminative expressions. The listener-speaker modules are trained jointly in an end-to-end learning framework, allowing the modules to be aware of one another during learning while also benefiting from the discriminative reinforcer's feedback. W e demonstrate that this unified framework and training achieves state-of-the-art results for both comprehension and generation on three referring expression datasets. Project and demo page: https://vision.cs.unc.edu/refer.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found