Chai, Joyce Y.
Natural Language Instructions for Intuitive Human Interaction with Robotic Assistants in Field Construction Work
Park, Somin, Wang, Xi, Menassa, Carol C., Kamat, Vineet R., Chai, Joyce Y.
The introduction of robots is widely considered to have significant potential of alleviating the issues of worker shortage and stagnant productivity that afflict the construction industry. However, it is challenging to use fully automated robots in complex and unstructured construction sites. Human-Robot Collaboration (HRC) has shown promise of combining human workers' flexibility and robot assistants' physical abilities to jointly address the uncertainties inherent in construction work. When introducing HRC in construction, it is critical to recognize the importance of teamwork and supervision in field construction and establish a natural and intuitive communication system for the human workers and robotic assistants. Natural language-based interaction can enable intuitive and familiar communication with robots for human workers who are non-experts in robot programming. However, limited research has been conducted on this topic in construction. This paper proposes a framework to allow human workers to interact with construction robots based on natural language instructions. The proposed method consists of three stages: Natural Language Understanding (NLU), Information Mapping (IM), and Robot Control (RC). Natural language instructions are input to a language model to predict a tag for each word in the NLU module. The IM module uses the result of the NLU module and building component information to generate the final instructional output essential for a robot to acknowledge and perform the construction task. A case study for drywall installation is conducted to evaluate the proposed approach. The obtained results highlight the potential of using natural language-based interaction to replicate the communication that occurs between human workers within the context of human-robot teams.
X-ToM: Explaining with Theory-of-Mind for Gaining Justified Human Trust
Akula, Arjun R., Liu, Changsong, Saba-Sadiya, Sari, Lu, Hongjing, Todorovic, Sinisa, Chai, Joyce Y., Zhu, Song-Chun
We present a new explainable AI (XAI) framework aimed at increasing justified human trust and reliance in the AI machine through explanations. We pose explanation as an iterative communication process, i.e. dialog, between the machine and human user. More concretely, the machine generates sequence of explanations in a dialog which takes into account three important aspects at each dialog turn: (a) human's intention (or curiosity); (b) human's understanding of the machine; and (c) machine's understanding of the human user. To do this, we use Theory of Mind (ToM) which helps us in explicitly modeling human's intention, machine's mind as inferred by the human as well as human's mind as inferred by the machine. In other words, these explicit mental representations in ToM are incorporated to learn an optimal explanation policy that takes into account human's perception and beliefs. Furthermore, we also show that ToM facilitates in quantitatively measuring justified human trust in the machine by comparing all the three mental representations. We applied our framework to three visual recognition tasks, namely, image classification, action recognition, and human body pose estimation. We argue that our ToM based explanations are practical and more natural for both expert and non-expert users to understand the internal workings of complex machine learning models. To the best of our knowledge, this is the first work to derive explanations using ToM. Extensive human study experiments verify our hypotheses, showing that the proposed explanations significantly outperform the state-of-the-art XAI methods in terms of all the standard quantitative and qualitative XAI evaluation metrics including human trust, reliance, and explanation satisfaction.
Collaborative Language Grounding Toward Situated Human-Robot Dialogue
Chai, Joyce Y. (Michigan State University) | Fang, Rui (Thomson Reuters) | Liu, Changsong (Michigan State University) | She, Lanbo (Michigan State University)
One particular challenge is to ground human language to robot internal representation of the physical world. Although copresent in a shared environment, humans and robots have mismatched capabilities in reasoning, perception, and action. A robot not only needs to incorporate collaborative effort from human partners to better connect human language to its own representation, but also needs to make extra collaborative effort to communicate its representation in language that humans can understand. This article gives a brief introduction to this research effort and discusses several collaborative approaches to grounding language to perception and action.
Collaborative Language Grounding Toward Situated Human-Robot Dialogue
Chai, Joyce Y. (Michigan State University) | Fang, Rui (Thomson Reuters) | Liu, Changsong (Michigan State University) | She, Lanbo (Michigan State University)
To enable situated human-robot dialogue, techniques to support grounded language communication are essential. One particular challenge is to ground human language to robot internal representation of the physical world. Although copresent in a shared environment, humans and robots have mismatched capabilities in reasoning, perception, and action. Their representations of the shared environment and joint tasks are significantly misaligned. Humans and robots will need to make extra effort to bridge the gap and strive for a common ground of the shared world. Only then, is the robot able to engage in language communication and joint tasks. Thus computational models for language grounding will need to take collaboration into consideration. A robot not only needs to incorporate collaborative effort from human partners to better connect human language to its own representation, but also needs to make extra collaborative effort to communicate its representation in language that humans can understand. To address these issues, the Language and Interaction Research group (LAIR) at Michigan State University has investigated multiple aspects of collaborative language grounding. This article gives a brief introduction to this research effort and discusses several collaborative approaches to grounding language to perception and action.
What’s Hot in Human Language Technology: Highlights from NAACL HLT 2015
Chai, Joyce Y. (Michigan State University) | Sarkar, Anoop (Simon Fraser University) | Mihalcea, Rada (University of Michigan)
Several discriminative models with latent variables were also explored to learn better alignment models in a wetlab The Conference of the North American Chapter of the Association experiment domain (Naim et al. 2015). As alignment is for Computational Linguistics: Human Language often the first step in many problems involving language and Technology (NAACL HLT) is a premier conference reporting vision, these approaches and empirical results provide important outstanding research on human language technology.
Task Learning through Visual Demonstration and Situated Dialogue
Liu, Changsong (Michigan State University) | Chai, Joyce Y. (Michigan State University) | Shukla, Nishant (University of California, Los Angeles) | Zhu, Song-Chun (University of California, Los Angeles)
To enable effective collaborations between humans and cognitive robots, it is important for robots to continuously acquire task knowledge from human partners. To address this issue, we are currently developing a framework that supports task learning through visual demonstration and natural language dialogue. One core component of this framework is the integration of language and vision that is driven by dialogue for task knowledge learning. This paper describes our on-going effort, particularly, grounded task learning through joint processing of video and dialogue using And-Or-Graphs (AOG).
Collaborative Models for Referring Expression Generation in Situated Dialogue
Fang, Rui (Michigan State University) | Doering, Malcolm (Michigan State University) | Chai, Joyce Y. (Michigan State University)
In situated dialogue with artificial agents (e.g., robots), although a human and an agent are co-present, the agent's representation and the human's representation of the shared environment are significantly mismatched. Because of this misalignment, our previous work has shown that when the agent applies traditional approaches to generate referring expressions for describing target objects with minimum descriptions, the intended objects often cannot be correctly identified by the human. To address this problem, motivated by collaborative behaviors in human referential communication, we have developed two collaborative models - an episodic model and an installment model - for referring expression generation. Both models, instead of generating a single referring expression to describe a target object as in the previous work, generate multiple small expressions that lead to the target object with the goal of minimizing the collaborative effort. In particular, our installment model incorporates human feedback in a reinforcement learning framework to learn the optimal generation strategies. Our empirical results have shown that the episodic model and the installment model outperform previous non-collaborative models with an absolute gain of 6% and 21% respectively.
Ambiguities in Spatial Language Understanding in Situated Human Robot Dialogue
Liu, Changsong (Michigan State University) | Walker, Jacob (Michigan State University) | Chai, Joyce Y. (Michigan State University)
In human robot dialogue, identifying intended referents from human partners’ spatial language is challenging. This is partly due to automated inference of potentially ambiguous underlying reference system (i.e., frame of reference ). To improve spatial language understanding, we conducted an empirical study to investigate the prevalence of ambiguities of frame of reference. Our findings indicate that ambiguities do arise frequently during human robot dialogues. Although situational factors from the spatial arrangement are less indicative for the underlying reference system, linguistic cues and individual preferences may allow reliable disambiguation.