Grammars & Parsing
Visual Entity Linking: A Preliminary Study
Weegar, Rebecka (Lund University) | Hammarlund, Linus (Lund University) | Tegen, Agnes (University of Gothenburg) | Oskarsson, Magnus (Lund University) | Åström, Kalle (Lund University) | Nugues, Pierre (Lund University)
In this paper, we describe a system that jointly extracts entities appearing in images and mentioned in their accompanying captions. As input, the entity linking program takes a segmented image together with its caption. It consists of a sequence of processing steps: part-of-speech tagging, dependency parsing, and coreference resolution that enables us to identify the entities as well as possible textual relations from the captions. The program uses the image regions labelled with a set of predefined categories and computes WordNet similarities between these labels and the entity names. Finally, the program links the entities it detected across the text and the images. We applied our system on the Segmented and Annotated IAPR TC-12 dataset that we enriched with entity annotations and we obtained a correct assignment rate of 55.48%
Semantic Graph Construction for Weakly-Supervised Image Parsing
Xie, Wenxuan (Peking University) | Peng, Yuxin (Peking University) | Xiao, Jianguo (Peking University)
We investigate weakly-supervised image parsing, i.e., assigning class labels to image regions by using image-level labels only. Existing studies pay main attention to the formulation of the weakly-supervised learning problem, i.e., how to propagate class labels from images to regions given an affinity graph of regions. Notably, however, the affinity graph of regions, which is generally constructed in relatively simpler settings in existing methods, is of crucial importance to the parsing performance due to the fact that the weakly-supervised parsing problem cannot be solved within a single image, and that the affinity graph enables label propagation among multiple images. In order to embed more semantics into the affinity graph, we propose novel criteria by exploiting the weak supervision information carefully, and develop two graphs: L1 semantic graph and k-NN semantic graph. Experimental results demonstrate that the proposed semantic graphs not only capture more semantic relevance, but also perform significantly better than conventional graphs in image parsing.
A Hybrid Grammar-Based Approach for Learning and Recognizing Natural Hand Gestures
Sadeghipour, Amir (Bielefeld University) | Kopp, Stefan (Bielefeld University)
In this paper, we present a hybrid grammar formalism designed to learn structured models of natural iconic gesture performances that allow for compressed representation and robust recognition. We analyze a dataset of iconic gestures and show how the proposed Feature-based Stochastic Context-Free Grammar (FSCFG) can generalize over both structural and feature-based variations among different gesture performances.
Unsupervised Alignment of Natural Language Instructions with Video Segments
Naim, Iftekhar (University of Rochester) | Song, Young Chol (University of Rochester) | Liu, Qiguang (University of Rochester) | Kautz, Henry (University of Rochester) | Luo, Jiebo (University of Rochester) | Gildea, Daniel (University of Rochester)
We propose an unsupervised learning algorithm for automatically inferring the mappings between English nouns and corresponding video objects. Given a sequence of natural language instructions and an unaligned video recording, we simultaneously align each instruction to its corresponding video segment, and also align nouns in each instruction to their corresponding objects in video. While existing grounded language acquisition algorithms rely on pre-aligned supervised data (each sentence paired with corresponding image frame or video segment), our algorithm aims to automatically infer the alignment from the temporal structure of the video and parallel text instructions. We propose two generative models that are closely related to the HMM and IBM 1 word alignment models used in statistical machine translation. We evaluate our algorithm on videos of biological experiments performed in wetlabs, and demonstrate its capability of aligning video segments to text instructions and matching video objects to nouns in the absence of any direct supervision.
Joint Morphological Generation and Syntactic Linearization
Song, Linfeng (Chinese Academy of Science) | Zhang, Yue (Singapore University of Technology and Design) | Song, Kai (Northeastern University) | Liu, Qun (Dublin City University and Chinese Academy of Science)
There has been growing interest in stochastic methods to natural language generation (NLG). While most NLG pipelines separate morphological generation and syntactic linearization, the two tasks are closely related. In this paper, we study joint morphological generation and linearization, making use of word order and inflections information for both tasks and reducing error propagation. Experiments show that the joint method significantly outperforms a strong pipelined baseline (by 1.1 BLEU points). It also achieves the best reported result on the Generation Challenge 2011 shared task.
User Intent Identification from Online Discussions Using a Joint Aspect-Action Topic Model
Nobari, Ghasem Heyrani (National University of Singapore) | Tat-Seng, Chua (National University of Singapore)
Online discussions are growing as a popular, effective and reliable source of information for users because of their liveliness, flexibility and up-to-date information. Online discussions are usually developed and advanced by groups of users with various backgrounds and intents. However because of their diversities in topics and issues discussed by the users, supervised methods are not able to accurately model such dynamic conditions. In this paper, we propose a novel unsupervised generative model to derive aspect-action pairs from online discussions. The proposed method simultaneously captures and models these two features with their relationships that exist in each thread. We assume that each user post is generated by a mixture of aspect and action topics. Therefore, we design a model that captures the latent factors that incorporates the aspect types and intended actions, which describe how users develop a topic in a discussion. In order to demonstrate the effectiveness of our approach, we empirically compare our model against the state of the art methods on large-scale discussion dataset, crawled from apple discussions with over 3.3 million user posts from 340k discussion threads.
Structured Generative Models of Natural Source Code
Maddison, Chris J., Tarlow, Daniel
We study the problem of building generative models of natural source code (NSC); that is, source code written and understood by humans. Our primary contribution is to describe a family of generative models for NSC that have three key properties: First, they incorporate both sequential and hierarchical structure. Second, we learn a distributed representation of source code elements. Finally, they integrate closely with a compiler, which allows leveraging compiler logic and abstractions when building structure into the model. We also develop an extension that includes more complex structure, refining how the model generates identifier tokens based on what variables are currently in scope. Our models can be learned efficiently, and we show empirically that including appropriate structure greatly improves the models, measured by the probability of generating test programs.
Sentiment Analysis Using Dependency Trees and Named-Entities
Yasavur, Ugan (Florida International University) | Travieso, Jorge (Florida International University) | Lisetti, Christine (Florida International University) | Rishe, Naphtali David (Florida International University)
There is an increasing interest for valence and emotion sensing using a variety of signals. Text, as a communication channel, gathers a substantial amount of interest for recognizing its underlying sentiment (valence or polarity), affect or emotion (e.g. happy, sadness). We consider recognizing the valence of a sentence as a prior task to emotion sensing. In this article, we discuss our approach to classify sentences in terms of emotional valence. Our supervised system performs syntactic and semantic analysis for feature extraction. It processes the interactions between words in sentences by using dependency parse trees, and it can decide the current polarity of named-entities based on on-the-fly topic modeling. We compared 3 rule-based approaches and two supervised approaches (i.e. Naive Bayes and Maximum Entropy). We trained and tested our system using the SemEval-2007 affective text dataset, which contains news headlines extracted from news websites. Our results show that our systems outperform the systems demonstrated in SemEval-2007.
Natural Language Access to Enterprise Data
Waltinger, Ulli (Siemens AG) | Tecuci, Dan (Siemens Corporation) | Olteanu, Mihaela (Siemens AG) | Mocanu, Vlad (Siemens AG) | Sullivan, Sean (Siemens Energy Inc.)
This paper describes USI Answers — a natural language question answering system for enterprise data. We report on the progress towards the goal of offering easy access to enterprise data to a large number of business users, most of whom are not familiar with the specific syntax or semantics of the underlying data sources. Additional complications come from the nature of the data, which comes both as structured and unstructured. The proposed solution allows users to express questions in natural language, makes apparent the system's interpretation of the query, and allows easy query adjustment and reformulation. The application is in use by more than 1500 users from Siemens Energy. We evaluate our approach on a data set consisting of fleet data.
Compositional Operators in Distributional Semantics
The recent developments on the syntactical and morphological analysis of natural language text constitute the first step towards a more ambitious goal, that of assigning a proper form of meaning to arbitrary text compounds. Indeed, for certain really "intelligent" applications, such as machine translation, question-answering systems, paraphrase detection, or automatic essay scoring, to name just a few, there will always exist a gap between raw linguistic information (such as part-of-speech labels, for example) and the knowledge of the real world that is needed for the completion of the task in a satisfactory way. Semantic analysis has exactly this role, aiming to close (or reduce as much as possible) this gap by linking the linguistic information with semantic representations that embody this elusive real-world knowledge. The traditional way of adding semantics to sentences is a syntax-driven compositional approach: every word in the sentence is associated with a primitive symbol or a predicate, and these are combined to larger and larger logical forms based on the syntactical rules of the grammar. At the end of the syntactical analysis, the logical representation of the whole sentence is a complex formula that can be fed to a theorem prover for further processing. Although such an approach seems intuitive, it has been shown that it is rather inefficient for any practical application (for example, Bos and Markert (2006) get very low recall scores for a textual entailment task).