Not enough data to create a plot.
Try a different view from the menu above.
Schlangen, David
Instruction Clarification Requests in Multimodal Collaborative Dialogue Games: Tasks, and an Analysis of the CoDraw Dataset
Madureira, Brielen, Schlangen, David
In visual instruction-following dialogue games, players can engage in repair mechanisms in face of an ambiguous or underspecified instruction that cannot be fully mapped to actions in the world. In this work, we annotate Instruction Clarification Requests (iCRs) in CoDraw, an existing dataset of interactions in a multimodal collaborative dialogue game. We show that it contains lexically and semantically diverse iCRs being produced self-motivatedly by players deciding to clarify in order to solve the task successfully. With 8.8k iCRs found in 9.9k dialogues, CoDraw-iCR (v1) is a large spontaneous iCR corpus, making it a valuable resource for data-driven research on clarification in dialogue. We then formalise and provide baseline models for two tasks: Determining when to make an iCR and how to recognise them, in order to investigate to what extent these tasks are learnable from data.
What A Situated Language-Using Agent Must be Able to Do: A Top-Down Analysis
Schlangen, David
Even in our increasingly text-intensive times, the primary site of language use is situated, co-present interaction. It is primary ontogenetically and phylogenetically, and it is arguably also still primary in negotiating everyday social situations. Situated interaction is also the final frontier of Natural Language Processing, where, compared to the area of text processing, very little progress has been made in the past decade, and where a myriad of practical applications is waiting to be unlocked. While the usual approach in the field is to reach, bottom-up, for the ever next "adjacent possible", in this paper I attempt a top-down analysis of what the demands are that unrestricted situated interaction makes on the participating agent, and suggest ways in which this analysis can structure computational models and research on them. Specifically, I discuss representational demands (the building up and application of world model, language model, situation model, discourse model, and agent model) and what I call anchoring processes (incremental processing, incremental learning, conversational grounding, multimodal grounding) that bind the agent to the here, now, and us.
Language Tasks and Language Games: On Methodology in Current Natural Language Processing Research
Schlangen, David
"This paper introduces a new task and a new dataset", "we improve the state of the art in X by Y" -- it is rare to find a current natural language processing paper (or AI paper more generally) that does not contain such statements. What is mostly left implicit, however, is the assumption that this necessarily constitutes progress, and what it constitutes progress towards. Here, we make more precise the normally impressionistically used notions of language task and language game and ask how a research programme built on these might make progress towards the goal of modelling general language competence.
Placing Objects in Gesture Space: Toward Incremental Interpretation of Multimodal Spatial Descriptions
Han, Ting (Bielefeld University) | Kennington, Casey (Boise State University) | Schlangen, David (Bielefeld University)
When describing routes not in the current environment, a common strategy is to anchor the description in configurations of salient landmarks, complementing the verbal descriptions by "placing" the non-visible landmarks in the gesture space. Understanding such multimodal descriptions and later locating the landmarks from real world is a challenging task for the hearer, who must interpret speech and gestures in parallel, fuse information from both modalities, build a mental representation of the description, and ground the knowledge to real world landmarks. In this paper, we model the hearer's task, using a multimodal spatial description corpus we collected. To reduce the variability of verbal descriptions, we simplified the setup to use simple objects as landmarks. We describe a real-time system to evaluate the separate and joint contribution of the modalities. We show that gestures not only help to improve the overall system performance, even if to a large extent they encode redundant information, but also result in earlier final correct interpretations. Being able to build and apply representations incrementally will be of use in more dialogical settings, we argue, where it can enable immediate clarification in cases of mismatch.
Turn-Taking and Coordination in Human-Machine Interaction
Andrist, Sean (University of Wisconsin-Madison) | Bohus, Dan (Microsoft) | Mutlu, Bilge (University of Wisconsin-Madison) | Schlangen, David (Bielefeld University)
This issue of AI Magazine brings together a collection of articles on challenges, mechanisms, and research progress in turn-taking and coordination between humans and machines. The contributing authors work in interrelated fields of spoken dialog systems, intelligent virtual agents, human-computer interaction, human-robot interaction, and semiautonomous collaborative systems and explore core concepts in coordinating speech and actions with virtual agents, robots, and other autonomous systems. Several of the contributors participated in the AAAI Spring Symposium on Turn-Taking and Coordination in Human-Machine Interaction, held in March 2015, and several articles in this issue are extensions of work presented at that symposium. The articles in the collection address key modeling, methodological, and computational challenges in achieving effective coordination with machines, propose solutions that overcome these challenges under sensory, cognitive, and resource restrictions, and illustrate how such solutions can facilitate coordination across diverse and challenging domains. The contributions highlight turn-taking and coordination in human-machine interaction as an emerging and evolving research area with important implications for future applications of AI.
The Power of a Glance: Evaluating Embodiment and Turn-Tracking Strategies of an Active Robotic Overhearer
Kousidis, Spyros (Bielefeld University) | Schlangen, David (Bielefeld University)
Side-participants (SPs) in multiparty dialogue establish and maintain their status as currently non-contributing, but integrated partners of the conversation by continuing to track, and be seen to be tracking, the conversation. To investigate strategies for realising such ‘active side-participant’ behaviour, we constructed an experimental setting where a humanoid robot appeared to track (overhear) a two-party conversation coming out of loudspeakers. We equipped the robot with ‘eyes’ (small displays) with movable pupils, to be able to separately control head-turning and gaze. Using information from the pre-processed conversations, we tested various strategies (random, reactive, predictive) for controlling gaze and head-turning. We asked human raters to judge videos of such tracking behaviour of the robot, and found that strategies making use of independent control of gaze and head direction were significantly preferred. Moreover, the ‘sensible’ strategies (reactive, predictive) were reliably distinguished from the baseline (random turning).We take this as indication that gaze is an important, semi-independent modality, and that our paradigm of off-line evaluation of overhearer behaviour using recorded interactions is a promising one for costeffective study of more sophisticated tracking models, and can stand as a proxy for testing models of actual side-participants (whose presence would be known, and would influence, the conversation they are part of).