'What are you referring to?' Evaluating the Ability of Multi-Modal Dialogue Models to Process Clarificational Exchanges
Chiyah-Garcia, Javier, Suglia, Alessandro, Eshghi, Arash, Hastie, Helen
–arXiv.org Artificial Intelligence
Referential ambiguities arise in dialogue when a referring expression does not uniquely identify the intended referent for the addressee. Addressees usually detect such ambiguities immediately and work with the speaker to repair it using meta-communicative, Clarificational Exchanges (CE): a Clarification Request (CR) and a response. Here, we argue that the ability to generate and respond to CRs imposes specific constraints on the architecture and objective functions of multi-modal, visually grounded dialogue models. We use the SIMMC 2.0 dataset to evaluate the ability of different state-of-the-art model architectures to process CEs, with a metric that probes the contextual updates that arise from them in the model. We find that language-based models are able to encode simple multi-modal semantic information and process some CEs, excelling with those related to the dialogue history, whilst multi-modal models can use additional learning objectives to obtain disentangled object representations, which become crucial to handle complex referential ambiguities across modalities overall.
arXiv.org Artificial Intelligence
Jul-28-2023
- Country:
- Oceania > Australia
- New South Wales > Sydney (0.04)
- North America
- Dominican Republic (0.04)
- United States
- Pennsylvania (0.04)
- New York (0.04)
- Washington > King County
- Seattle (0.04)
- Europe
- Italy (0.04)
- Croatia (0.04)
- United Kingdom > Scotland
- City of Edinburgh > Edinburgh (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Denmark > North Jutland
- Aalborg (0.04)
- Asia
- China > Hong Kong (0.04)
- North Korea > Hwanghae-namdo
- Haeju (0.04)
- Oceania > Australia
- Genre:
- Personal > Interview (0.83)
- Research Report (0.70)
- Technology: