Plotting

 Artstein, Ron


Human-Robot Dialogue Annotation for Multi-Modal Common Ground

arXiv.org Artificial Intelligence

In this paper, we describe the development of symbolic representations annotated on human-robot dialogue data to make dimensions of meaning accessible to autonomous systems participating in collaborative, natural language dialogue, and to enable common ground with human partners. A particular challenge for establishing common ground arises in remote dialogue (occurring in disaster relief or search-and-rescue tasks), where a human and robot are engaged in a joint navigation and exploration task of an unfamiliar environment, but where the robot cannot immediately share high quality visual information due to limited communication constraints. Engaging in a dialogue provides an effective way to communicate, while on-demand or lower-quality visual information can be supplemented for establishing common ground. Within this paradigm, we capture propositional semantics and the illocutionary force of a single utterance within the dialogue through our Dialogue-AMR annotation, an augmentation of Abstract Meaning Representation. We then capture patterns in how different utterances within and across speaker floors relate to one another in our development of a multi-floor Dialogue Structure annotation schema. Finally, we begin to annotate and analyze the ways in which the visual modalities provide contextual information to the dialogue for overcoming disparities in the collaborators' understanding of the environment. We conclude by discussing the use-cases, architectures, and systems we have implemented from our annotations that enable physical robots to autonomously engage with humans in bi-directional dialogue and navigation.


SCOUT: A Situated and Multi-Modal Human-Robot Dialogue Corpus

arXiv.org Artificial Intelligence

We introduce the Situated Corpus Of Understanding Transactions (SCOUT), a multi-modal collection of human-robot dialogue in the task domain of collaborative exploration. The corpus was constructed from multiple Wizard-of-Oz experiments where human participants gave verbal instructions to a remotely-located robot to move and gather information about its surroundings. SCOUT contains 89,056 utterances and 310,095 words from 278 dialogues averaging 320 utterances per dialogue. The dialogues are aligned with the multi-modal data streams available during the experiments: 5,785 images and 30 maps. The corpus has been annotated with Abstract Meaning Representation and Dialogue-AMR to identify the speaker's intent and meaning within an utterance, and with Transactional Units and Relations to track relationships between utterances to reveal patterns of the Dialogue Structure. We describe how the corpus and its annotations have been used to develop autonomous human-robot systems and enable research in open questions of how humans speak to robots. We release this corpus to accelerate progress in autonomous, situated, human-robot dialogue, especially in the context of navigation tasks where details about the environment need to be discovered.


Listen to My Body: Does Making Friends Help Influence People?

AAAI Conferences

We investigate the effect of relational dialogue on creating rapport and exerting social influence in human-robot conversation, by comparing interactions with and without a relational component, and with different agent types. Human participants interact with two agents โ€” a Nao robot and a virtual human โ€” in four dialogue scenarios: one involving building familiarity, and three involving sharing information and persuasion in item-ranking tasks. Results show that both agents influence human decision-making; people prefer interacting with the robot, feel higher rapport with the robot, and believe the robot has more influence; and that objective influence of the agent on the person is increased by building familiarity, but is not significantly different between the agents.


Ethics for a Combined Human-Machine Dialogue Agent

AAAI Conferences

We discuss philosophical and ethical issues that arise from a dialogue system intended to portray a real person, using recordings of the person together with a machine agent that selects recordings during a synchronous conversation with a user. System output may count as actions of the speaker if the speaker intends to communicate with users and the outputs represent what the speaker would have chosen to say in context; in such cases the system can justifiably be said to be holding a conversation that is offset in time. The autonomous agent may at times misrepresent the speaker's intentions, and such failures are analogous to good-faith misunderstandings. The user may or may not need to be informed that the speaker is not organically present, depending on the application.


Virtual Humans for Learning

AI Magazine

Virtual humans are computer-generated characters designed to look and behave like real people. Studies have shown that virtual humans can mimic many of the social effects that one finds in human-human interactions such as creating rapport, and people respond to virtual humans in ways that are similar to how they respond to real people. We believe that virtual humans represent a new metaphor for interacting with computers, one in which working with a computer becomes much like interacting with a person and this can bring social elements to the interaction that are not easily supported with conventional interfaces. The second SimCoach, uses an empathetic virtual human to provide veterans and their families with information about PTSD and depression.


Virtual Humans for Learning

AI Magazine

Virtual humans are computer-generated characters designed to look and behave like real people. Studies have shown that virtual humans can mimic many of the social effects that one finds in human-human interactions such as creating rapport, and people respond to virtual humans in ways that are similar to how they respond to real people. We believe that virtual humans represent a new metaphor for interacting with computers, one in which working with a computer becomes much like interacting with a person and this can bring social elements to the interaction that are not easily supported with conventional interfaces. We present two systems that embody these ideas. The first, the Twins are virtual docents in the Museum of Science, Boston, designed to engage visitors and raise their awareness and knowledge of science. The second SimCoach, uses an empathetic virtual human to provide veterans and their families with information about PTSD and depression.


Augmenting Conversational Characters with Generated Question-Answer Pairs

AAAI Conferences

We take a conversational character trained on a set of linked question-answer pairs authored by hand, and augment its training data by adding sets of question-answer pairs which are generated automatically from texts on different topics. The augmented characters can answer questions about the new topics, at the cost of some performance loss on questions about the topics that the original character was trained to answer.


Improving Spoken Dialogue Understanding Using Phonetic Mixture Models

AAAI Conferences

Augmenting word tokens with a phonetic representation, derived from a dictionary, improves the performance of a Natural Language Understanding component that interprets speech recognizer output: we observed a 5% to 7% reduction in errors across a wide range of response return rates. The best performance comes from mixture models incorporating both word and phone features. Since the phonetic representation is derived from a dictionary, the method can be applied easily without the need for integration with a specific speech recognizer. The method has similarities with autonomous (or bottom-up) psychological models of lexical access, where contextual information is not integrated at the stage of auditory perception but rather later.