We present an approach for learning grounded language from mixed-initiative human-robot interaction. Prior work on learning from human instruction has concentrated on acquisition of task-execution knowledge from domain-specific language. In this work, we demonstrate acquisition of linguistic, semantic, perceptual, and procedural knowledge from mixed-initiative, natural language dialog. Our approach has been instantiated in a cognitive architecture, Soar, and has been deployed on a table-top robotic arm capable of picking up small objects. A preliminary analysis verifies the ability of the robot to acquire diverse knowledge from human-robot interaction.
As natural language capable robots and other agents become more commonplace, the ability for these agents to understand truly natural human speech is becoming increasingly important. What is more, these agents must be able to understand truly natural human speech in realistic scenarios, in which an agent may not have full certainty in its knowledge of its environment, and in which an agent may not have full knowledge of the entities contained in its environment. As such, I am interested in developing architectural mechanisms which will allow robots to understand natural language in uncertain and open-worlds. My work towards this goal has primarily focused on two problems: (1) reference resolution, and (2) pragmatic reasoning.
Over the last five years, and while developing an architecture for autonomous service robots in human environments, we have identified several key decisional issues that are to be tackled for a cognitive robot to share space and tasks with a human. We introduce some of them here: situation assessment and mutual modelling, management and exploitation of each agent (human and robot) knowledge in separate cognitive models, natural multi-modal communication, "human-aware" task planning, and human and robot interleaved plan achievement. As a general "take home" message, it appears that explicit knowledge management, both symbolic and geometric, proves to be a successful key while attempting to address these challenges, as it pushes for a different, more semantic way to address the decision-making issue in human-robot interactions.
Robots deployed in domains characterized by non-deterministic action outcomes and unforeseen changes frequently need considerable knowledge about the domain and tasks they have to perform. Humans, however, may not have the time and expertise to provide elaborate or accurate domain knowledge, and it may be difficult for robots to obtain many labeled training samples of domain objects and events. For widespread deployment, robots thus need the ability to incrementally and automatically extract relevant domain knowledge from multimodal sensor inputs, acquiring and using human feedback when such feedback is necessary and available. This paper describes a multiple-instance active learning algorithm for such incremental learning in the context of building models of relevant domain objects. We introduce the concept of bag uncertainty, enabling robots to identify the need for feedback, and to incrementally revise learned object models by associating visual cues extracted from images with verbal cues extracted from limited high-level human feedback. Images of indoor and outdoor scenes drawn from the IAPR TC-12 benchmark dataset are used to show that our algorithm provides better object recognition accuracy than a state of the art multiple-instance active learning algorithm.