Detecting Referring Expressions in Visually Grounded Dialogue with Autoregressive Language Models

Open in new window