Europe
High Dimensional Data Fusion via Joint Manifold Learning
Davenport, Mark A. (Stanford University) | Hegde, Chinmay (Rice University) | Duarte, Marco F. (Princeton University) | Baraniuk, Richard G. (Rice University)
The emergence of low-cost sensing architectures for diverse modalities has made it possible to deploy sensor networks that acquire large amounts of very high-dimensional data. To cope with such a data deluge, manifold models are often developed that provide a powerful theoretical and algorithmic framework for capturing the intrinsic structure of data governed by a low-dimensional set of parameters. However, these models do not typically take into account dependencies among multiple sensors. We thus propose a new joint manifold framework for data ensembles that exploits such dependencies. We show that joint manifold structure can lead to improved performance for manifold learning. Additionally, we leverage recent results concerning random projections of manifolds to formulate a universal, network-scalable dimensionality reduction scheme that efficiently fuses the data from all sensors.
Learning Grounded Communicative Intent from Human-Robot Dialog
Modayil, Joseph (University of Alberta)
Studying how a robot can learn to communicate with a person provides insight into how communication might be learned in general. Deep models of dialog and communicative intent typically rely on modeling the internal state of the speakers—states that are unobservable by a learning robot. This paper considers how communication can be framed to be learnable from experience. In particular, we describe how an agent might learn to communicate by building on three foundational capabilities, namely 1) an observable signal of satisfied intent (a smile), 2) the ability to imitate perceived actions, and 3) perceptual referents for discourse items. Early simulation results show that an agent can learn some basic communication skills from these foundations.
Learnable Controllers for Adaptive Dialogue Processing Management
Kruijff, Geert-Jan M. (DFKI GmbH) | Krieger, Hans-Ulrich
The paper focuses on how a model could be learnt for determining at runtime how much of spoken input needs to be understood, and what configuration of processes can be expected to yield that result. Typically, a dialogue system applies a fixed configuration of shallow and deep forms of processing to its input. The configuration tries to balance robustness with depth of understanding, creating a system that always tries to understand as well as it can. The paper adopts a different view, assuming that what needs to be understood can vary per context. To facilitate this any-depth processing, the paper proposes an approach based on learnable controllers. The paper illustrates the main ideas of the approach on examples from a robot acquiring situated dialogue competence, and a robot working with users on a task.
A Discriminative Model for Understanding Natural Language Route Directions
Kollar, Thomas (Massachusetts Institute of Technology) | Tellex, Stefanie (Massachusetts Institute of Technology) | Roy, Nicholas (Massachusetts Institute of Technology)
To be useful teammates to human partners, robots must be able to follow spoken instructions given in natural language. However, determining the correct sequence of actions in response to a set of spoken instructions is a complex decision-making problem. There is a "semantic gap" between the high-level symbolic models of the world that people use, and the low-level models of geometry, state dynamics, and perceptions that robots use. In this paper, we show how this gap can be bridged by inferring the best sequence of actions from a linguistic description and environmental features. This work improves upon previous work in three ways. First, by using a conditional random field (CRF), we learn the relative weight of environmental and linguistic features, enabling the system to learn the meanings of words and reducing the modeling effort in learning how to follow commands. Second, a number of long-range features are added, which help the system to use additional structure in the problem. Finally, given a natural language command, we infer both the referred path and landmark directly, thereby requiring the algorithm to pick a landmark by which it should navigate. The CRF is demonstrated to have 15% error on a held-out dataset, when compared with 39% error for a Markov random field (MRF). Finally, by analyzing the additional annotations necessary for this work, we find that natural language route directions map sequentially onto the corresponding path and landmarks 99.6% of the time. In addition, the size of the referred landmark varies from 0m 2 to 1964m 2 and the length of the referred path varies from 0 m to 40.83 m .
Enhanced Visual Scene Understanding through Human-Robot Dialog
Johnson-Roberson, Matthew (Royal Institute of Technology (KTH)) | Bohg, Jeannette (Royal Institute of Technology (KTH) | Kragic, Danica (Royal Institute of Technology (KTH)) | Skantze, Gabriel (Royal Institute of Technology (KTH)) | Gustafson, Joakim (Royal Institute of Technology (KTH)) | Carlson, Rolf (Royal Institute of Technology (KTH))
In this paper, we propose a novel human-robot-interaction framework for the purpose of rapid visual scene understanding. The task of the robot is to correctly enumerate how many separate objects there are in the scene and to describe them in terms of their attributes. Our approach builds on top of a state-of-the-art 3D segmentation method segmenting stereo reconstructed point clouds into object hypotheses and combines it with a natural dialog system. By putting a `human in the loop', the robot gains knowledge about ambiguous situations beyond its own resolution. Specifically, we are introducing an entropy-based system to spot the poorest object hypotheses and query the user for arbitration. Based on the information obtained from the human-to-robot dialog, the scene segmentation can be re-seeded and thereby improved. We present experimental results on real data that show an improved segmentation performance compared to segmentation without interaction.
Visual Salience and Reference Resolution in Situated Dialogues: A Corpus-based Evaluation
Schuette, Niels (Dublin Institute of Technology) | Kelleher, John (Dublin Institute of Technology) | Namee, Brian (Dublin Institute of Technology)
Dialogues between humans and robots are necessarily situated. Exophoric references to objects in the shared visual context are very frequent in situated dialogues, for example when a human is verbally guiding a tele-operated mobile robot. We present an approach to automatically resolving exophoric referring expressions in a situated dialogue based on the visual salience of possible referents. We evaluate the effectiveness of this approach and a range of different salience metrics using data from the SCARE corpus which we have augmented with visual information. The results of our evaluation show that our computationally lightweight approach is successful, and so promising for use in human-robot dialogue systems.
Toward Fast Mapping for Robot Adjective Learning
Petrosino, Allison (Wellesley College) | Gold, Kevin (Rochester Institute of Technology)
Fast mapping is a phenomenon by which children learn the meanings of novel adjectives after a very small number of exposures when the new word is contrasted with a known word. The present study was a preliminary test of whether machine learners could use such contrasts in unconstrained speech to learn adjective meanings and categories. Six decision tree-based learning methods were evaluated that use contrasting examples in order to work toward an adjective fast-mapping system for machine learners. Subjects tended to compare objects using adjectives of the same category, implying that such contrasts may be a useful source of data about adjective meaning, though none of the learning algorithms showed strong advantages over any other.
Modeling Human-Robot Interaction Based on Generic Interaction Patterns
Peltason, Julia (Bielefeld University) | Wrede, Britta (Bielefeld University)
While current techniques for human-robot interaction modeling are typically limited to restrictive command-control style, traditional dialog modeling approaches are not directly applicable to robotics due to the lack of real-world integration. We present an approach that combines insights from dialog modeling with software-engineering demands that arise in robotics system research to provide a generalizable framework that can easily be applied to new scenarios. This goal is achieved by defining interaction patterns that combine abstract task states (such as task accepted or failed) with robot dialog acts (such as assertion or apology). An evaluation of the usability for robotic experts and novices showed that both groups were able to program 3 out of 5 dialog patterns in one hour while showing a steep learning curve. We argue that the proposed approach allows for less restricted and more informative human-robot interactions.
Toward Integrating Natural-HRI into Spoken Dialog
Kanda, Takayuki (ATR Intelligent Robotics and Communication Laboratory)
This paper summarizes our previous works in modeling non-verbal behaviors for natural human-robot interaction (HRI) and discusses a path for integrating them into spoken dialogs. While some non-verbal behaviors can be considered “optional” elements to be added to a spoken dialog, some non-verbal behaviors substantially require a harmonized plan that simultaneously considers both spoken dialog and non-verbal behavior. The paper discusses such unique HRI features.