Detecting Referring Expressions in Visually Grounded Dialogue with Autoregressive Language Models