Marge, Matthew
Human-Robot Dialogue Annotation for Multi-Modal Common Ground
Bonial, Claire, Lukin, Stephanie M., Abrams, Mitchell, Baker, Anthony, Donatelli, Lucia, Foots, Ashley, Hayes, Cory J., Henry, Cassidy, Hudson, Taylor, Marge, Matthew, Pollard, Kimberly A., Artstein, Ron, Traum, David, Voss, Clare R.
In this paper, we describe the development of symbolic representations annotated on human-robot dialogue data to make dimensions of meaning accessible to autonomous systems participating in collaborative, natural language dialogue, and to enable common ground with human partners. A particular challenge for establishing common ground arises in remote dialogue (occurring in disaster relief or search-and-rescue tasks), where a human and robot are engaged in a joint navigation and exploration task of an unfamiliar environment, but where the robot cannot immediately share high quality visual information due to limited communication constraints. Engaging in a dialogue provides an effective way to communicate, while on-demand or lower-quality visual information can be supplemented for establishing common ground. Within this paradigm, we capture propositional semantics and the illocutionary force of a single utterance within the dialogue through our Dialogue-AMR annotation, an augmentation of Abstract Meaning Representation. We then capture patterns in how different utterances within and across speaker floors relate to one another in our development of a multi-floor Dialogue Structure annotation schema. Finally, we begin to annotate and analyze the ways in which the visual modalities provide contextual information to the dialogue for overcoming disparities in the collaborators' understanding of the environment. We conclude by discussing the use-cases, architectures, and systems we have implemented from our annotations that enable physical robots to autonomously engage with humans in bi-directional dialogue and navigation.
SCOUT: A Situated and Multi-Modal Human-Robot Dialogue Corpus
Lukin, Stephanie M., Bonial, Claire, Marge, Matthew, Hudson, Taylor, Hayes, Cory J., Pollard, Kimberly A., Baker, Anthony, Foots, Ashley N., Artstein, Ron, Gervits, Felix, Abrams, Mitchell, Henry, Cassidy, Donatelli, Lucia, Leuski, Anton, Hill, Susan G., Traum, David, Voss, Clare R.
We introduce the Situated Corpus Of Understanding Transactions (SCOUT), a multi-modal collection of human-robot dialogue in the task domain of collaborative exploration. The corpus was constructed from multiple Wizard-of-Oz experiments where human participants gave verbal instructions to a remotely-located robot to move and gather information about its surroundings. SCOUT contains 89,056 utterances and 310,095 words from 278 dialogues averaging 320 utterances per dialogue. The dialogues are aligned with the multi-modal data streams available during the experiments: 5,785 images and 30 maps. The corpus has been annotated with Abstract Meaning Representation and Dialogue-AMR to identify the speaker's intent and meaning within an utterance, and with Transactional Units and Relations to track relationships between utterances to reveal patterns of the Dialogue Structure. We describe how the corpus and its annotations have been used to develop autonomous human-robot systems and enable research in open questions of how humans speak to robots. We release this corpus to accelerate progress in autonomous, situated, human-robot dialogue, especially in the context of navigation tasks where details about the environment need to be discovered.
Dialogue with Robots: Proposals for Broadening Participation and Research in the SLIVAR Community
Kennington, Casey, Alikhani, Malihe, Pon-Barry, Heather, Atwell, Katherine, Bisk, Yonatan, Fried, Daniel, Gervits, Felix, Han, Zhao, Inan, Mert, Johnston, Michael, Korpan, Raj, Litman, Diane, Marge, Matthew, Matuszek, Cynthia, Mead, Ross, Mohan, Shiwali, Mooney, Raymond, Parde, Natalie, Sinapov, Jivko, Stewart, Angela, Stone, Matthew, Tellex, Stefanie, Williams, Tom
The ability to interact with machines using natural human language is becoming not just commonplace, but expected. The next step is not just text interfaces, but speech interfaces and not just with computers, but with all machines including robots. In this paper, we chronicle the recent history of this growing field of spoken dialogue with robots and offer the community three proposals, the first focused on education, the second on benchmarks, and the third on the modeling of language when it comes to spoken interaction with robots. The three proposals should act as white papers for any researcher to take and build upon.
Spoken Language Interaction with Robots: Research Issues and Recommendations, Report from the NSF Future Directions Workshop
Marge, Matthew, Espy-Wilson, Carol, Ward, Nigel
With robotics rapidly advancing, more effective human-robot interaction is increasingly needed to realize the full potential of robots for society. While spoken language must be part of the solution, our ability to provide spoken language interaction capabilities is still very limited. The National Science Foundation accordingly convened a workshop, bringing together speech, language, and robotics researchers to discuss what needs to be done. The result is this report, in which we identify key scientific and engineering advances needed. Our recommendations broadly relate to eight general themes. First, meeting human needs requires addressing new challenges in speech technology and user experience design. Second, this requires better models of the social and interactive aspects of language use. Third, for robustness, robots need higher-bandwidth communication with users and better handling of uncertainty, including simultaneous consideration of multiple hypotheses and goals. Fourth, more powerful adaptation methods are needed, to enable robots to communicate in new environments, for new tasks, and with diverse user populations, without extensive re-engineering or the collection of massive training data. Fifth, since robots are embodied, speech should function together with other communication modalities, such as gaze, gesture, posture, and motion. Sixth, since robots operate in complex environments, speech components need access to rich yet efficient representations of what the robot knows about objects, locations, noise sources, the user, and other humans. Seventh, since robots operate in real time, their speech and language processing components must also. Eighth, in addition to more research, we need more work on infrastructure and resources, including shareable software modules and internal interfaces, inexpensive hardware, baseline systems, and diverse corpora.
Towards Overcoming Miscommunication in Situated Dialogue by Asking Questions
Marge, Matthew (Carnegie Mellon University) | Rudnicky, Alexander I. (Carnegie Mellon University)
Situated dialogue is prominent in the robot navigation task, where a human gives route instructions (i.e., a sequence of navigation commands) to an agent. We propose an approach for situated dialogue agents whereby they use strategies such as asking questions to repair or recover from unclear instructions, namely those that an agent misunderstands or considers ambiguous. Most immediately in this work we study examples from existing human-human dialogue corpora and relate them to our proposed approach.
Instruction Taking in the TeamTalk System
Rudnicky, Alexander I. (Carnegie Mellon University) | Pappu, Aasish (Carnegie Mellon University) | Li, Peng (Carnegie Mellon University) | Marge, Matthew (Carnegie Mellon University) | Frisch, Benjamin (Carnegie Mellon University)
TeamTalk is dialogue framework that supports multi-participant spoken interaction between humans and robots in a task-oriented setting that requires cooperation and coordination between team members. This paper describes some recently added features to the system, in particular the ability for robots to accept and remember location labels and the ability to learn action sequences. These capabilities reflect the incorporation into the system of an ontology and an instruction understanding component.