Referring Expression Generation in Visually Grounded Dialogue with Discourse-aware Comprehension Guiding