The migration of conversational AI agents across different embodiments in order to maintain the continuity of the task has been recently explored to further improve user experience. However, these migratable agents lack contextual understanding of the user information and the migrated device during the dialog conversations with the user. This opens the question of how an agent might behave when migrated into an embodiment for contextually predicting the next utterance. We collected a dataset from the dialog conversations between crowdsourced workers with the migration context involving personal and non-personal utterances in different settings (public or private) of embodiment into which the agent migrated. We trained the generative and information retrieval models on the dataset using with and without migration context and report the results of both qualitative metrics and human evaluation. We believe that the migration dataset would be useful for training future migratable AI systems.
Can we learn a control policy able to adapt its behaviour in real time so as to take any desired amount of risk? The general Reinforcement Learning framework solely aims at optimising a total reward in expectation, which may not be desirable in critical applications. In stark contrast, the Budgeted Markov Decision Process (BMDP) framework is a formalism in which the notion of risk is implemented as a hard constraint on a failure signal. Existing algorithms solving BMDPs rely on strong assumptions and have so far only been applied to toy-examples. In this work, we relax some of these assumptions and demonstrate the scalability of our approach on two practical problems: a spoken dialogue system and an autonomous driving task. On both examples, we reach similar performances as Lagrangian Relaxation methods with a significant improvement in sample and memory efficiency.
Abeyruwan, Saminda (University of Miami) | Baral, Ramesh (Florida International University) | Yasavur, Ugan (Florida International University) | Lisetti, Christine (Florida International University) | Visser, Ubbo (University of Miami)
We combined a spoken dialog system that we developed to deliver brief health interventions with the fully autonomous humanoid robot (NAO). The dialog system is based on a framework facilitating Markov decision processes (MDP). It is optimized using reinforcement learning (RL) algorithms with data we collected from real user interactions. The system begins to learn optimal dialog strategies for initiative selection and for the type of confirmations that it uses during theinteraction. The health intervention, delivered by a 3D character instead of the NAO, has already been evaluated, with positive results in terms of task completion, ease of use, and future intention to use the system. The current spoken dialog system for the humanoid robot is a novelty and exists so far as a proof ofconcept.
An important difficulty in developing spoken dialogue systems for robots is the open-ended nature of most interactions. Robotic agents must typically operate in complex, continuously changing environments which are difficult to model and do not provide any clear, predefined goal. Directly capturing this complexity in a single, large dialogue policy is thus inadequate. This paper presents a new approach which tackles the complexity of open-ended interactions by breaking it into a set of small, independent policies, which can be activated and deactivated at runtime by a dedicated mechanism. The approach is currently being implemented in a spoken dialogue system for autonomous robots.