Goto

Collaborating Authors

 Grammars & Parsing


Learning to Parse and Ground Natural Language Commands to Robots

AAAI Conferences

This paper describes a weakly supervised approach for understanding natural language commands to robotic systems. Our approach, called the combinatory grounding graph (CGG), takes as input natural language commands paired with groundings and infers the space of parses that best describe how to ground the natural language command. The command is understood in a compositional way, generating a latent hierarchical parse tree that involves relations (such as "to" or "by") and categories (such as "the elevators" or "the doors"). We show an example parse-grounding tree and show that our system can successfully cluster the meanings of objects and locations.


Considering State in Plan Recognition with Lexicalized Grammars

AAAI Conferences

This paper documents extending the ELEXIR (Engine for LEXicalized Intent Recognition) system (Geib 2009; Geib 2011) with a world model. This is a significant increase in the expressiveness of the plan recognition system and allows a number of additions to the algorithm, most significantly conditioning probabilities for recognized plans on the state of the world during execution. Since, ELEXIR falls in the family of gramatical methods for plan recognition in viewing the problem of plan recognition as that of parsing, this paper will also briefly discuss how this extension relates to state of the art proposals in the natural language community regarding probabilistic parsing.


Learning to Interpret Natural Language Instructions

AAAI Conferences

We address the problem of training an artificial agent to follow verbal commands using a set of instructions paired with demonstration traces of appropriate behavior. From this data, a mapping from instructions to tasks is learned, enabling the agent to carry out new instructions in novel environments. Our system consists of three components: semantic parsing (SP), inverse reinforcement learning (IRL), and task abstraction (TA). SP parses sentences into logical form representations, but when learning begins, the domain/task specific meanings of these representations are unknown. IRL takes demonstration traces and determines the likely reward functions that gave rise to these traces, defined over a set of provided features. TA combines results from SP and IRL over a set of training instances to create abstract goal definitions of tasks. TA also provides SP domain specific meanings for its logical forms and provides IRL the set of task-relevant features.


Make it So: Continuous, Flexible Natural Language Interaction with an Autonomous Robot

AAAI Conferences

While highly constrained language can be used for robot control, robots that can operate as fully autonomous subordinate agents communicating via rich language remain an open challenge. Toward this end, we developed an autonomous system that supports natural, continuous interaction with the operator through language before, during, and after mission execution. The operator communicates instructions to the system through natural language and is given feedback on how each instruction was understood as the system constructs a logical representation of its orders. While the plan is executed, the operator is updated on relevant progress via language and images and can change the robot's orders. Unlike many other integrated systems of this type, the language interface is built using robust, general purpose parsing and semantics systems that do not rely on domain-specific grammars. This system demonstrates a new level of continuous natural language interaction and a novel approach to using general-purpose language and planning components instead of hand-building for the domain. Language-enabled autonomous systems of this type represent important progress toward the goal of integrating robots as effective members of human teams.


Fine-Grained Entity Recognition

AAAI Conferences

Entity Recognition (ER) is a key component of relation extraction systems and many other natural-language processing applications. Unfortunately, most ER systems are restricted to produce labels from to a small set of entity classes, e.g., person, organization, location or miscellaneous. In order to intelligently understand text and extract a wide range of information, it is useful to more precisely determine the semantic classes of entities mentioned in unstructured text. This paper defines a fine-grained set of 112 tags, formulates the tagging problem as multi-class, multi-label classification, describes an unsupervised method for collecting training data, and presents the FIGER implementation. Experiments show that the system accurately predicts the tags for entities. Moreover, it provides useful information for a relation extraction system, increasing the F1 score by 93%. We make FIGER and its data available as a resource for future work.


A Testbed for Learning by Demonstration from Natural Language and RGB-Depth Video

AAAI Conferences

We are developing a testbed for learning by demonstration combining spoken language and sensor data in a natural real-world environment. Microsoft Kinect RGB-Depth cameras allow us to infer high-level visual features, such as the relative position of objects in space, with greater precision and less training than required by traditional systems. Speech is recognized and parsed using a โ€œdeepโ€ parsing system, so that language features are available at the word, syntactic, and semantic levels. We collected an initial data set of 10 episodes of 7 individuals demonstrating how to โ€œmake teaโ€, and created a โ€œgold standardโ€ hand annotation of the actions performed in each. Finally, we are constructing โ€œbaselineโ€ HMM-based activity recognition models using the visual and language features, in order to be ready to evaluate the performance of our future work on deeper and more structured models.


Parsing Outdoor Scenes from Streamed 3D Laser Data Using Online Clustering and Incremental Belief Updates

AAAI Conferences

In this paper, we address the problem of continually parsing a stream of 3D point cloud data acquired from a laser sensor mounted on a road vehicle. We leverage an online star clustering algorithm coupled with an incremental belief update in an evolving undirected graphical model. The fusion of these techniques allows the robot to parse streamed data and to continually improve its understanding of the world. The core competency produced is an ability to infer object classes from similarities based on appearance and shape features, and to concurrently combine that with a spatial smoothing algorithm incorporating geometric consistency. This formulation of feature-space star clustering modulating the potentials of a spatial graphical model is entirely novel. In our method, the two sources of information: feature similarity and geometrical consistency are fed continu- ally into the system, improving the belief over the class distributions as new data arrives. The algorithm obviates the need for hand-labeled training data and makes no apriori assumptions on the number or characteristics of object categories. Rather, they are learnt incrementally over time from streamed input data. In experiments per- formed on real 3D laser data from an outdoor scene, we show that our approach is capable of obtaining an ever- improving unsupervised scene categorization.


Collective Nominal Semantic Role Labeling for Tweets

AAAI Conferences

Tweets have become an increasingly popular source of fresh information. We investigate the task of Nominal Semantic Role Labeling (NSRL) for tweets, which aims to identify predicate-argument structures defined by nominals in tweets. Studies of this task can help fine-grained information extraction and retrieval from tweets. There are two main challenges in this task: 1) The lack of information in a single tweet, rooted in the short and noisy nature of tweets; and 2) recovery of implicit arguments. We propose jointly conducting NSRL on multiple similar tweets using a graphical model, leveraging the redundancy in tweets to tackle these challenges. Extensive evaluations on a human annotated data set demonstrate that our method outperforms two baselines with an absolute gain of 2.7% in F1.


Opinion Target Extraction Using a Shallow Semantic Parsing Framework

AAAI Conferences

In this paper, we present a simplified shallow semantic parsing approach to extracting opinion targets. This is done by formulating opinion target extraction (OTE) as a shallow semantic parsing problem with the opinion expression as the predicate and the corresponding targets as its arguments. In principle, our parsing approach to OTE differs from the state-of-the-art sequence labeling one in two aspects. First, we model OTE from parse tree level, where abundant structured syntactic information is available for use, instead of word sequence level, where only lexical information is available. Second, we focus on determining whether a constituent, rather than a word, is an opinion target or not, via a simplified shallow semantic parsing framework. Evaluation on two datasets shows that structured syntactic information plays a critical role in capturing the domination relationship between an opinion expression and its targets. It also shows that our parsing approach much outperforms the state-of-the-art sequence labeling one.


Simple Robust Grammar Induction with Combinatory Categorial Grammars

AAAI Conferences

We present a simple EM-based grammar induction algorithm for Combinatory Categorial Grammar (CCG) that achieves state-of-the-art performance by relying on a minimal number of very general linguistic principles. Unlike previous work on unsupervised parsing with CCGs, our approach has no prior language-specific knowledge, and discovers all categories automatically. Additionally, unlike other approaches, our grammar remains robust when parsing longer sentences, performing as well as or better than other systems. We believe this is a natural result of using an expressive grammar formalism with an extended domain of locality.