Goto

Collaborating Authors

 Grammars & Parsing


A Bayesian LDA-based model for semi-supervised part-of-speech tagging

Neural Information Processing Systems

We present a novel Bayesian model for semi-supervised part-of-speech tagging. Our model extends the Latent Dirichlet Allocation model and incorporates the intuition that words' distributions over tags, p(t w), are sparse. Our model outper- forms the best previously proposed model for this task on a standard dataset.


Rapid Inference on a Novel AND/OR graph for Object Detection, Segmentation and Parsing

Neural Information Processing Systems

In this paper we formulate a novel AND/OR graph representation capable of describing the different configurations of deformable articulated objects such as horses. The representation makes use of the summarization principle so that lower level nodes in the graph only pass on summary statistics to the higher level nodes. The probability distributions are invariant to position, orientation, and scale. We develop a novel inference algorithm that combined a bottom-up process for proposing configurations for horses together with a top-down process for refining and validating these proposals. The strategy of surround suppression is applied to ensure that the inference time is polynomial in the size of input data.


Logistic Normal Priors for Unsupervised Probabilistic Grammar Induction

Neural Information Processing Systems

We explore a new Bayesian model for probabilistic grammars, a family of distributions over discrete structures that includes hidden Markov models and probabilistic context-free grammars. Our model extends the correlated topic model framework to probabilistic grammars, exploiting the logistic normal distribution as a prior over the grammar parameters. We derive a variational EM algorithm for that model, and then experiment with the task of unsupervised grammar induction for natural language dependency parsing. We show that our model achieves superior results over previous models that use different priors.


Recursive Segmentation and Recognition Templates for 2D Parsing

Neural Information Processing Systems

Language and image understanding are two major goals of artificial intelligence which can both be conceptually formulated in terms of parsing the input signal into a hierarchical representation. Natural language researchers have made great progress by exploiting the 1D structure of language to design efficient polynomial- time parsing algorithms. By contrast, the two-dimensional nature of images makes it much harder to design efficient image parsers and the form of the hierarchical representations is also unclear. Attempts to adapt representations and algorithms from natural language have only been partially successful. In this paper, we propose a Hierarchical Image Model (HIM) for 2D image pars- ing which outputs image segmentation and object recognition.


Synergies in learning words and their referents

Neural Information Processing Systems

This paper presents Bayesian non-parametric models that simultaneously learn to segment words from phoneme strings and learn the referents of some of those words, and shows that there is a synergistic interaction in the acquisition of these two kinds of linguistic information. The models themselves are novel kinds of Adaptor Grammars that are an extension of an embedding of topic models into PCFGs. These models simultaneously segment phoneme sequences into words and learn the relationship between non-linguistic objects to the words that refer to them. We show (i) that modelling inter-word dependencies not only improves the accuracy of the word segmentation but also of word-object relationships, and (ii) that a model that simultaneously learns word-object relationships and word segmentation segments more accurately than one that just learns word segmentation on its own. We argue that these results support an interactive view of language acquisition that can take advantage of synergies such as these.


Image Parsing with Stochastic Scene Grammar

Neural Information Processing Systems

In contrast to previous scene labeling work that applied discriminative classifiers to pixels (or super-pixels), we use a generative Stochastic Scene Grammar (SSG). This grammar represents the compositional structures of visual entities from scene categories, 3D foreground/background, 2D faces, to 1D lines. The grammar includes three types of production rules and two types of contextual relations. Production rules: (i) AND rules represent the decomposition of an entity into sub-parts; (ii) OR rules represent the switching among sub-types of an entity; (iii) SET rules rep- resent an ensemble of visual entities. Contextual relations: (i) Cooperative " " relations represent positive links between binding entities, such as hinged faces of a object or aligned boxes; (ii) Competitive "-" relations represents negative links between competing entities, such as mutually exclusive boxes. We design an efficient MCMC inference algorithm, namely Hierarchical cluster sampling, to search in the large solution space of scene configurations. The algorithm has two stages: (i) Clustering: It forms all possible higher-level structures (clusters) from lower-level entities by production rules and contextual relations.


Tensor Decomposition for Fast Parsing with Latent-Variable PCFGs

Neural Information Processing Systems

We describe an approach to speed-up inference with latent variable PCFGs, which have been shown to be highly effective for natural language parsing. Our approach is based on a tensor formulation recently introduced for spectral estimation of latent-variable PCFGs coupled with a tensor decomposition algorithm well-known in the multilinear algebra literature. We also describe an error bound for this approximation, which bounds the difference between the probabilities calculated by the algorithm and the true probabilities that the approximated model gives. Empirical evaluation on real-world natural language parsing data demonstrates a significant speed-up at minimal cost for parsing performance.


Identifiability and Unmixing of Latent Parse Trees

Neural Information Processing Systems

This paper explores unsupervised learning of parsing models along two directions. First, which models are identifiable from infinite data? We use a general technique for numerically checking identifiability based on the rank of a Jacobian matrix, and apply it to several standard constituency and dependency parsing models. Second, for identifiable models, how do we estimate the parameters efficiently? EM suffers from local optima, while recent work using spectral methods cannot be directly applied since the topology of the parse tree varies across sentences.


Object-centric Inference for Language Conditioned Placement: A Foundation Model based Approach

arXiv.org Artificial Intelligence

Abstract-- We focus on the task of language-conditioned object placement, in which a robot should generate placements that satisfy all the spatial relational constraints in language instructions. Previous works based on rule-based language parsing or scene-centric visual representation have restrictions on the form of instructions and reference objects or require large amounts of training data. We propose an object-centric framework that leverages foundation models to ground the reference objects and spatial relations for placement, which is more sample efficient and generalizable. Experiments indicate that our model can achieve a 97.75% success rate of placement with only 0.26M trainable parameters. Object placement is an essential task in human-robot contains only one object in the scene and does not support interaction.


Natural Language Robot Programming: NLP integrated with autonomous robotic grasping

arXiv.org Artificial Intelligence

In this paper, we present a grammar-based natural language framework for robot programming, specifically for pick-and-place tasks. Our approach uses a custom dictionary of action words, designed to store together words that share meaning, allowing for easy expansion of the vocabulary by adding more action words from a lexical database. We validate our Natural Language Robot Programming (NLRP) framework through simulation and real-world experimentation, using a Franka Panda robotic arm equipped with a calibrated camera-in-hand and a microphone. Participants were asked to complete a pick-and-place task using verbal commands, which were converted into text using Google's Speech-to-Text API and processed through the NLRP framework to obtain joint space trajectories for the robot. Our results indicate that our approach has a high system usability score. The framework's dictionary can be easily extended without relying on transfer learning or large data sets. In the future, we plan to compare the presented framework with different approaches of human-assisted pick-and-place tasks via a comprehensive user study.