Grammars & Parsing
Inference on Syntactic and Semantic Structures for Machine Comprehension
Li, Chenrui (East China Normal University) | Wu, Yuanbin (East China Normal University) | Lan, Man (East China Normal University)
Hidden variable models are important tools for solving open domain machine comprehension tasks and have achieved remarkable accuracy in many question answering benchmark datasets. Existing models impose strong independence assumptions on hidden variables, which leaves the interaction among them unexplored. Here we introduce linguistic structures to help capturing global evidence in hidden variable modeling. In the proposed algorithms, question-answer pairs are scored based on structured inference results on parse trees and semantic frames, which aims to assign hidden variables in a global optimal way. Experiments on the MCTest dataset demonstrate that the proposed models are highly competitive with state-of-the-art machine comprehension systems.
A Neural Transition-Based Approach for Semantic Dependency Graph Parsing
Wang, Yuxuan (Harbin Institute of Technology) | Che, Wanxiang (Harbin Institute of Technology) | Guo, Jiang (Harbin Institute of Technology) | Liu, Ting (Harbin Institute of Technology)
Semantic dependency graph has been recently proposed as an extension of tree-structured syntactic or semantic representation for natural language sentences. It particularly features the structural property of multi-head, which allows nodes to have multiple heads, resulting in a directed acyclic graph(DAG) parsing problem. Yet most statistical parsers focused exclusively on shallow bi-lexical tree structures, DAG parsing remains under-explored. In this paper, we propose a neural transition-based parser, using a variant of list-based arc-eager transition algorithm for dependency graph parsing. Particularly, two non-trivial improvements are proposed for representing the key components of the transition system, to better capture the semantics of segments and internal sub-graph structures. We test our parser on the SemEval-2016 Task 9 dataset (Chinese) and the SemEval-2015 Task 18 dataset (English). On both benchmark datasets, we obtain superior or comparable results to the best performing systems. Our parser can be further improved with a simple ensemble mechanism, resulting in the state-of-the-art performance.
Neural Character-level Dependency Parsing for Chinese
Li, Haonan (Shanghai Jiao Tong University) | Zhang, Zhisong (Shanghai Jiao Tong University) | Ju, Yuqi (Shanghai Jiao Tong University) | Zhao, Hai (Shanghai Jiao Tong University)
This inconvenience makes us do necessary restorations from character-level dependency parsing results Table 2: Character-level evaluation. Character-level dependency parsing covers all levels of language processing within a Chinese sentence. Our model shows that even integrating the least character position simplifies the pipeline into two steps, character POS tagging, information, it is beneficial to the parser.. and character dependency parsing, while traditional processing Finally, effective integration of two levels of tags boosts has to handle word segmentation, POS tagging for word, the performance most. For CHAR WORD strategy, it is more and word-level dependency parsing as shown in Figure 2. straightforward but also brings too many tags or labels and With different processing hierarchies, we also provide complete thus will slow down the parsing and make the learning more matches (CM) as one metric for the related evaluation. The character parsing performance comparison is given in Table reason might be that since characters instead of words are 1, in which the following observations are obtained.
Question Answering as Global Reasoning Over Semantic Abstractions
Khashabi, Daniel (University of Pennsylvania) | Khot, Tushar (Allen Institute for Artificial Intelligence) | Sabharwal, Ashish (Allen Institute for Artificial Intelligence) | Roth, Dan (University of Pennsylvania)
We propose a novel method for exploiting the semantic structure of text to answer multiple-choice questions. The approach is especially suitable for domains that require reasoning over a diverse set of linguistic constructs but have limited training data. To address these challenges, we present the first system, to the best of our knowledge, that reasons over a wide range of semantic abstractions of the text, which are derived using off-the-shelf, general-purpose, pre-trained natural language modules such as semantic role labelers, coreference resolvers, and dependency parsers. Representing multiple abstractions as a family of graphs, we translate question answering (QA) into a search for an optimal subgraph that satisfies certain global and local properties. This formulation generalizes several prior structured QA systems. Our system, SEMANTICILP, demonstrates strong performance on two domains simultaneously. In particular, on a collection of challenging science QA datasets, it outperforms various state-of-the-art approaches, including neural models, broad coverage information retrieval, and specialized techniques using structured knowledge bases, by 2%-6%.
Computer-Assisted Authoring for Natural Language Story Scripts
Sanghrajka, Rushit (Disney Research) | Witoล, Wojciech (Disney Research) | Schriber, Sasha (Disney Research) | Gross, Markus (Disney Research) | Kapadia, Mubbasir (Rutgers University, Disney Research)
In order to assist scriptwriters during the process of story-writing, we have developed a system that can extract information from natural language stories, and allow for story-centric as well as character-centric reasoning. These inferencing capabilities are exposed to the user through intuitive querying systems, allowing the scriptwriter to ask the system questions about story and character information. We introduce knowledge bytes as atoms of information and demonstrate that the system can parse text into a stream of knowledge bytes and use these mentioned reasoning capabilities through logical reasoning.
Scene-Centric Joint Parsing of Cross-View Videos
Qi, Hang (University of California, Los Angeles) | Xu, Yuanlu (University of California, Los Angeles) | Yuan, Tao (University of California, Los Angeles) | Wu, Tianfu (NC State University) | Zhu, Song-Chun (University of California, Los Angeles)
Cross-view video understanding is an important yet under-explored area in computer vision. In this paper, we introduce a joint parsing framework that integrates view-centric proposals into scene-centric parse graphs that represent a coherent scene-centric understanding of cross-view scenes. Our key observations are that overlapping fields of views embed rich appearance and geometry correlations and that knowledge fragments corresponding to individual vision tasks are governed by consistency constraints available in commonsense knowledge. The proposed joint parsing framework represents such correlations and constraints explicitly and generates semantic scene-centric parse graphs. Quantitative experiments show that scene-centric predictions in the parse graph outperform view-centric predictions.
Cross-Domain Human Parsing via Adversarial Feature and Label Adaptation
Liu, Si (Institute of Information Engineering, Chinese Academy of Sciences) | Sun, Yao (Institute of Information Engineering, Chinese Academy of Sciences) | Zhu, Defa (Institute of Information Engineering, Chinese Academy of Sciences) | Ren, Guanghui (Institute of Information Engineering, Chinese Academy of Sciences) | Chen, Yu (JD.com) | Feng, Jiashi (National University of Singapore) | Han, Jizhong (Institute of Information Engineering, Chinese Academy of Sciences)
Human parsing has been extensively studied recently due to its wide applications in many important scenarios. Mainstream fashion parsing models (i.e., parsers) focus on parsing the high-resolution and clean images. However, directly applying the parsers trained on benchmarks of high-quality samples to a particular application scenario in the wild, e.g., a canteen, airport or workplace, often gives non-satisfactory performance due to domain shift. In this paper, we explore a new and challenging cross-domain human parsing problem: taking the benchmark dataset with extensive pixel-wise labeling as the source domain, how to obtain a satisfactory parser on a new target domain without requiring any additional manual labeling? To this end, we propose a novel and efficient cross-domain human parsing model to bridge the cross-domain differences in terms of visual appearance and environment conditions and fully exploit commonalities across domains. Our proposed model explicitly learns a feature compensation network, which is specialized for mitigating the cross-domain differences. A discriminative feature adversarial network is introduced to supervise the feature compensation to effectively reduces the discrepancy between feature distributions of two domains. Besides, our proposed model also introduces a structured label adversarial network to guide the parsing results of the target domain to follow the high-order relationships of the structured labels shared across domains. The proposed framework is end-to-end trainable, practical and scalable in real applications. Extensive experiments are conducted where LIP dataset is the source domain and 4 different datasets including surveillance videos, movies and runway shows without any annotations, are evaluated as target domains. The results consistently confirm data efficiency and performance advantages of the proposed method for the challenging cross-domain human parsing problem.
Residual Encoder Decoder Network and Adaptive Prior for Face Parsing
Guo, Tianchu (Beijing Samsung Telecommunication) | Kim, Youngsung (Samsung Advanced Institute of Technology) | Zhang, Hui (Beijing Samsung Telecommunication) | Qian, Deheng (Beijing Samsung Telecommunication) | Yoo, ByungIn (Samsung Advanced Insitute of Technology) | Xu, Jingtao (Beijing Samsung Telecommunication) | Zou, Dongqing (Beijing Samsung Telecommunication) | Han, Jae-Joon (Samsung Advanced Institute of Technology) | Choi, Changkyu (Samsung Advanced Institue of Technology)
Face Parsing assigns every pixel in a facial image with a semantic label, which could be applied in various applications including face recognition, facial beautification, affective computing and animation. While lots of progress have been made in this field, current state-of-the-art methods still fail to extract real effective feature and restore accurate score map, especially for those facial parts which have large variations of deformation and fairly similar appearance, e.g. mouth, eyes and thin eyebrows. In this paper, we propose a novel pixel-wise face parsing method called Residual Encoder Decoder Network (RED-Net), which combines a feature-rich encoder-decoder framework with adaptive prior mechanism. Our encoder-decoder framework extracts feature with ResNet and decodes the feature by elaborately fusing the residual architectures in to deconvolution. This framework learns more effective feature comparing to that learnt by decoding with interpolation or classic deconvolution operations. To overcome the appearance ambiguity between facial parts, an adaptive prior mechanism is proposed in term of the decoder prediction confidence, allowing refining the final result. The experimental results on two public datasets demonstrate that our method outperforms the state-of-the-arts significantly, achieving improvements of F-measure from 0.854 to 0.905 on Helen dataset, and pixel accuracy from 95.12% to 97.59% on the LFW dataset. In particular, convincing qualitative examples show that our method parses eye, eyebrow, and lip regins more accurately.
Using Syntax to Ground Referring Expressions in Natural Images
Cirik, Volkan (Language Technologies Institute,ย Carnegie Mellon University) | Berg-Kirkpatrick, Taylor (Language Technologies Institute,ย Carnegie Mellon University) | Morency, Louis-Philippe (Language Technologies Institute,ย Carnegie Mellon University)
We introduce GroundNet, a neural network for referring expression recognition---the task of localizing (or grounding) in an image the object referred to by a natural language expression. Our approach to this task is the first to rely on a syntactic analysis of the input referring expression in order to inform the structure of the computation graph. Given a parse tree for an input expression, we explicitly map the syntactic constituents and relationships present in the tree to a composed graph of neural modules that defines our architecture for performing localization. This syntax-based approach aids localization of both the target object and auxiliary supporting objects mentioned in the expression. As a result, GroundNet is more interpretable than previous methods: we can (1) determine which phrase of the referring expression points to which object in the image and (2) track how the localization of the target object is determined by the network. We study this property empirically by introducing a new set of annotations on the GoogleRef dataset to evaluate localization of supporting objects. Our experiments show that GroundNet achieves state-of-the-art accuracy in identifying supporting objects, while maintaining comparable performance in the localization of target objects.
RNN-Based Sequence-Preserved Attention for Dependency Parsing
Zhou, Yi (Fudan University) | Zhou, Junying (Fudan University) | Liu, Lu (Fudan University) | Feng, Jiangtao (Fudan University) | Peng, Haoyuan (Fudan University) | Zheng, Xiaoqing (Fudan University)
Recurrent neural networks (RNN) combined with attention mechanism has proved to be useful for various NLP tasks including machine translation, sequence labeling and syntactic parsing. The attention mechanism is usually applied by estimating the weights (or importance) of inputs and taking the weighted sum of inputs as derived features. Although such features have demonstrated their effectiveness, they may fail to capture the sequence information due to the simple weighted sum being used to produce them. The order of the words does matter to the meaning or the structure of the sentences, especially for syntactic parsing, which aims to recover the structure from a sequence of words. In this study, we propose an RNN-based attention to capture the relevant and sequence-preserved features from a sentence, and use the derived features to perform the dependency parsing. We evaluated the graph-based and transition-based parsing models enhanced with the RNN-based sequence-preserved attention on the both English PTB and Chinese CTB datasets. The experimental results show that the enhanced systems were improved with significant increase in parsing accuracy.