Goto

Collaborating Authors

 Traum, David


Human-Robot Dialogue Annotation for Multi-Modal Common Ground

arXiv.org Artificial Intelligence

In this paper, we describe the development of symbolic representations annotated on human-robot dialogue data to make dimensions of meaning accessible to autonomous systems participating in collaborative, natural language dialogue, and to enable common ground with human partners. A particular challenge for establishing common ground arises in remote dialogue (occurring in disaster relief or search-and-rescue tasks), where a human and robot are engaged in a joint navigation and exploration task of an unfamiliar environment, but where the robot cannot immediately share high quality visual information due to limited communication constraints. Engaging in a dialogue provides an effective way to communicate, while on-demand or lower-quality visual information can be supplemented for establishing common ground. Within this paradigm, we capture propositional semantics and the illocutionary force of a single utterance within the dialogue through our Dialogue-AMR annotation, an augmentation of Abstract Meaning Representation. We then capture patterns in how different utterances within and across speaker floors relate to one another in our development of a multi-floor Dialogue Structure annotation schema. Finally, we begin to annotate and analyze the ways in which the visual modalities provide contextual information to the dialogue for overcoming disparities in the collaborators' understanding of the environment. We conclude by discussing the use-cases, architectures, and systems we have implemented from our annotations that enable physical robots to autonomously engage with humans in bi-directional dialogue and navigation.


SCOUT: A Situated and Multi-Modal Human-Robot Dialogue Corpus

arXiv.org Artificial Intelligence

We introduce the Situated Corpus Of Understanding Transactions (SCOUT), a multi-modal collection of human-robot dialogue in the task domain of collaborative exploration. The corpus was constructed from multiple Wizard-of-Oz experiments where human participants gave verbal instructions to a remotely-located robot to move and gather information about its surroundings. SCOUT contains 89,056 utterances and 310,095 words from 278 dialogues averaging 320 utterances per dialogue. The dialogues are aligned with the multi-modal data streams available during the experiments: 5,785 images and 30 maps. The corpus has been annotated with Abstract Meaning Representation and Dialogue-AMR to identify the speaker's intent and meaning within an utterance, and with Transactional Units and Relations to track relationships between utterances to reveal patterns of the Dialogue Structure. We describe how the corpus and its annotations have been used to develop autonomous human-robot systems and enable research in open questions of how humans speak to robots. We release this corpus to accelerate progress in autonomous, situated, human-robot dialogue, especially in the context of navigation tasks where details about the environment need to be discovered.


TwoStep: Multi-agent Task Planning using Classical Planners and Large Language Models

arXiv.org Artificial Intelligence

Classical planning formulations like the Planning Domain Definition Language (PDDL) admit action sequences guaranteed to achieve a goal state given an initial state if any are possible. However, reasoning problems defined in PDDL do not capture temporal aspects of action taking, for example that two agents in the domain can execute an action simultaneously if postconditions of each do not interfere with preconditions of the other. A human expert can decompose a goal into largely independent constituent parts and assign each agent to one of these subgoals to take advantage of simultaneous actions for faster execution of plan steps, each using only single agent planning. By contrast, large language models (LLMs) used for directly inferring plan steps do not guarantee execution success, but do leverage commonsense reasoning to assemble action sequences. We combine the strengths of classical planning and LLMs by approximating human intuitions for two-agent planning goal decomposition. We demonstrate that LLM-based goal decomposition leads to faster planning times than solving multi-agent PDDL problems directly while simultaneously achieving fewer plan execution steps than a single agent plan alone and preserving execution success. Additionally, we find that LLM-based approximations of subgoals can achieve similar multi-agent execution steps than those specified by human experts. Website and resources at https://glamor-usc.github.io/twostep


Navigating to Success in Multi-Modal Human-Robot Collaboration: Analysis and Corpus Release

arXiv.org Artificial Intelligence

Human-guided robotic exploration is a useful approach to gathering information at remote locations, especially those that might be too risky, inhospitable, or inaccessible for humans. Maintaining common ground between the remotely-located partners is a challenge, one that can be facilitated by multi-modal communication. In this paper, we explore how participants utilized multiple modalities to investigate a remote location with the help of a robotic partner. Participants issued spoken natural language instructions and received from the robot: text-based feedback, continuous 2D LIDAR mapping, and upon-request static photographs. We noticed that different strategies were adopted in terms of use of the modalities, and hypothesize that these differences may be correlated with success at several exploration sub-tasks. We found that requesting photos may have improved the identification and counting of some key entities (doorways in particular) and that this strategy did not hinder the amount of overall area exploration. Future work with larger samples may reveal the effects of more nuanced photo and dialogue strategies, which can inform the training of robotic agents. Additionally, we announce the release of our unique multi-modal corpus of human-robot communication in an exploration context: SCOUT, the Situated Corpus on Understanding Transactions.


Interactive Evaluation of Dialog Track at DSTC9

arXiv.org Artificial Intelligence

The ultimate goal of dialog research is to develop systems that can be effectively used in interactive settings by real users. To this end, we introduced the Interactive Evaluation of Dialog Track at the 9th Dialog System Technology Challenge. This track consisted of two sub-tasks. The first sub-task involved building knowledge-grounded response generation models. The second sub-task aimed to extend dialog models beyond static datasets by assessing them in an interactive setting with real users. Our track challenges participants to develop strong response generation models and explore strategies that extend them to back-and-forth interactions with real users. The progression from static corpora to interactive evaluation introduces unique challenges and facilitates a more thorough assessment of open-domain dialog systems. This paper provides an overview of the track, including the methodology and results. Furthermore, it provides insights into how to best evaluate open-domain dialog models


Using Reinforcement Learning to Manage Communications Between Humans and Artificial Agents in an Evacuation Scenario

AAAI Conferences

In search and rescue missions, robots can potentially help save survivors faster than human emergency responders alone would. In our experimental virtual reality simulation environment we have a system which comprises a swarm of unmanned aerial vehicles (UAVs) and a virtual "spokesperson". The system and a human operator work together on locating and guiding survivors to safety away from an active wildfire encroaching on a small town. The UAVs and the spokesperson are equipped with natural language capabilities through which they can communicate with the survivors to convince them to evacuate. If they fail to do so they can ask the human operator to intervene. We use reinforcement learning to automatically learn a policy to be followed when a UAV has located survivors. The system learns the best course of action to help the survivors evacuate, i.e., warn them through the UAV or the spokesperson, ask the human operator to intervene if needed, guide them to safety via their preferred method of transportation, or just wait for more information. We vary the distance of the fire, the level of cooperativeness of the survivors, and how busy the human operator is, and we report results in terms of percentage of survivors saved in each condition.


Listen to My Body: Does Making Friends Help Influence People?

AAAI Conferences

We investigate the effect of relational dialogue on creating rapport and exerting social influence in human-robot conversation, by comparing interactions with and without a relational component, and with different agent types. Human participants interact with two agents — a Nao robot and a virtual human — in four dialogue scenarios: one involving building familiarity, and three involving sharing information and persuasion in item-ranking tasks. Results show that both agents influence human decision-making; people prefer interacting with the robot, feel higher rapport with the robot, and believe the robot has more influence; and that objective influence of the agent on the person is increased by building familiarity, but is not significantly different between the agents.



Mr. Clue — A Virtual Agent that Can Play Word-Guessing Games

AAAI Conferences

This demonstration showcases a virtual agent, Mr. Clue, capable of acting in the role of clue-giver in a word-guessing game. The agent has the ability to automatically generate clues and update its dialogue policy dynamically based on user input.


Virtual Humans for Learning

AI Magazine

Virtual humans are computer-generated characters designed to look and behave like real people. Studies have shown that virtual humans can mimic many of the social effects that one finds in human-human interactions such as creating rapport, and people respond to virtual humans in ways that are similar to how they respond to real people. We believe that virtual humans represent a new metaphor for interacting with computers, one in which working with a computer becomes much like interacting with a person and this can bring social elements to the interaction that are not easily supported with conventional interfaces. The second SimCoach, uses an empathetic virtual human to provide veterans and their families with information about PTSD and depression.