Wu, Emily (Brown University) | Han, Yuxin (Rhode Island School of Design) | Whitney, David (Brown University) | Oberlin, John (Brown University) | MacGlashan, James (Brown University) | Tellex, Stefanie (Brown University)
Issuing and following instructions is a common task in many forms of both human-human and human-robot collaboration. With two human participants, the accuracy of instruction following increases if the collaborators can monitor the state of their partners and respond to them through conversation (Clark and Krych 2004), a process we call social feedback. Despite this benefit in human-human interaction, current human-robot collaboration systems process instructions in non-incremental batches, which can achieve good accuracy but does not allow for reactive feedback (Tellex et al. 2011; Matuszek et al. 2012; Tellex et al. 2012; Misra et al.2014). In this paper, we show that giving a robot the ability to ask the user questions results in responsive conversations and allows the robot to quickly determine the object that the user desires. This social feedback loop between person and robot allows a person to create an internal model for the robot’s mental state and adapt their own behavior to better inform the robot. To close the human-robot feedback loop, we employ a Partially Observable Markov Decision Process (POMDP) to produce a policy which will lead to the determination of the object in the shortest amount of time. To test our approach, we perform user studies to measure our robot’s ability to deliver common household items requested by the participant. We compare delivery speed and accuracy both with and without social feedback.
This paper is about generating questions in human-robot interaction. We survey existing work on the forms and meanings of questions in English and discuss the pragmatic effects resulting from an interplay between the choice of syntactic form and intonation. We propose an approach to formalization based on a notion of common ground and commitment, set in a model of situated dialogue as part of collaborative activity where we explicitly model the beliefs and intentions of both the robot and the human. Questions come about by abductively inferring an intentional structure grounded in the belief model and indicating commitments. Content planning and surface realization turn this into a question of the appropriate form.
We are motivated by building a system for an autonomous robot companion that collaborates with a human partner for achieving a common mission. The objective of the robot is to infer the human's preferences upon the tasks of the mission so as to collaborate with the human by achieving human's non-favorite tasks. Inspired by recent researches about the recognition of human's intention, we propose a unified model that allows the robot to switch accurately between verbal and non-verbal interactions. Our system unifies an epistemic partially observable Markov decision process (POMDP) that is a human-robot spoken dialog system aiming at disambiguating the human's preferences and an intuitive human-robot collaboration consisting in inferring human's intention based on the observed human actions. The beliefs over human's preferences computed during the dialog are then reinforced in the course of the task execution by the intuitive interaction. Our unified model helps the robot inferring the human's preferences and deciding which tasks to perform to effectively satisfy these preferences. The robot is also able to adjust its plan rapidly in case of sudden changes in the human's preferences and to switch between both kind of interactions. Experimental results on a scenario inspired from robocup@home outline various specific behaviors of the robot during the collaborative mission.
Intelligent planning algorithms such as the Partially Observable Markov Decision Process (POMDP) have succeeded in dialog management applications (Roy, Pineau, & Thrun 2000; Williams & Young 2005; Williams, Poupart, & Young 2005) because they are robust to uncertainties in human-robot interaction. Like all dialog planning systems, POMDPs require an accurate model of what the user might say and how he wishes to interact with the robot. In the POMDP framework, the user's vocabulary and preferences are generally specified using a large probabilistic model with many parameters. While it may be easy for an expert to specify reasonable values for these parameters, gathering data to specify the parameters accurately a priori is expensive. In this paper, we take a Bayesian approach to learning the user model while simultaneously refining the dialog manager's policy. First, we show how to compute the optimal dialog policy with uncertain parameters (in the absence of learning), along with a heuristic that allows the dialog manager to intelligently replan its policy given data from recent interactions. Next, we present a pair of approaches which explicitly consider the robot's uncertainty about the true user model when taking actions; we show these approaches can learn user preferences more robustly. A key contribution of this work is the use of "meta-actions," queries about what the robot should have done, to discover a user's dialog preferences without making mistakes that may potentially annoy the user.
Designing dialog policies for voice-enabled interfaces is a tailoring job that is most often left to natural language processing experts. This job is generally redone for every new dialog task because cross-domain transfer is not possible. For this reason, machine learning methods for dialog policy optimization have been investigated during the last 15 years. Especially, reinforcement learning (RL) is now part of the state of the art in this domain. Standard RL methods require to test more or less random changes in the policy on users to assess them as improvements or degradations. This is called on policy learning. Nevertheless, it can result in system behaviors that are not acceptable by users. Learning algorithms should ideally infer an optimal strategy by observing interactions generated by a non-optimal but acceptable strategy, that is learning off-policy. In this contribution, a sample-efficient, online and off-policy reinforcement learning algorithm is proposed to learn an optimal policy from few hundreds of dialogues generated with a very simple handcrafted policy.