Grammars & Parsing
Learning Neural Sequence-to-Sequence Models from Weak Feedback with Bipolar Ramp Loss
Jehl, Laura, Lawrence, Carolin, Riezler, Stefan
In many machine learning scenarios, supervision by gold labels is not available and consequently neural models cannot be trained directly by maximum likelihood estimation (MLE). In a weak supervision scenario, metric-augmented objectives can be employed to assign feedback to model outputs, which can be used to extract a supervision signal for training. We present several objectives for two separate weakly supervised tasks, machine translation and semantic parsing. We show that objectives should actively discourage negative outputs in addition to promoting a surrogate gold structure. This notion of bipolarity is naturally present in ramp loss objectives, which we adapt to neural models. We show that bipolar ramp loss objectives outperform other non-bipolar ramp loss objectives and minimum risk training (MRT) on both weakly supervised tasks, as well as on a supervised machine translation task. Additionally, we introduce a novel token-level ramp loss objective, which is able to outperform even the best sequence-level ramp loss on both weakly supervised tasks.
Learning to Generate Synthetic 3D Training Data through Hybrid Gradient
Synthetic images rendered by graphics engines are a promising source for training deep networks. However, it is challenging to ensure that they can help train a network to perform well on real images, because a graphics-based generation pipeline requires numerous design decisions such as the selection of 3D shapes and the placement of the camera. In this work, we propose a new method that optimizes the generation of 3D training data based on what we call "hybrid gradient". We parametrize the design decisions as a real vector, and combine the approximate gradient and the analytical gradient to obtain the hybrid gradient of the network performance with respect to this vector. We evaluate our approach on the task of estimating surface normals from a single image. Experiments on standard benchmarks show that our approach can outperform the prior state of the art on optimizing the generation of 3D training data, particularly in terms of computational efficiency.
Multi-Criteria Chinese Word Segmentation with Transformer
Qiu, Xipeng, Pei, Hengzhi, Yan, Hang, Huang, Xuanjing
Different linguistic perspectives cause many diverse segmentation criteria for Chinese word segmentation (CWS). Most existing methods focus on improving the performance of single-criterion CWS. However, it is interesting to exploit these heterogeneous segmentation criteria and mine their common underlying knowledge. In this paper, we propose a concise and effective model for multi-criteria CWS, which utilizes a shared fully-connected self-attention model to segment the sentence according to a criterion indicator. Experiments on eight datasets with heterogeneous segmentation criteria show that the performance of each corpus obtains a significant improvement, compared to single-criterion learning.
Program Synthesis and Semantic Parsing with Learned Code Idioms
Shin, Richard, Allamanis, Miltiadis, Brockschmidt, Marc, Polozov, Oleksandr
Program synthesis of general-purpose source code from natural language specifications is challenging due to the need to reason about high-level patterns in the target program and low-level implementation details at the same time. In this work, we present PATOIS, a system that allows a neural program synthesizer to explicitly interleave high-level and low-level reasoning at every generation step. It accomplishes this by automatically mining common code idioms from a given corpus, incorporating them into the underlying language for neural synthesis, and training a tree-based neural synthesizer to use these idioms during code generation. We evaluate PATOIS on two complex semantic parsing datasets and show that using learned code idioms improves the synthesizer's accuracy.
Event extraction based on open information extraction and ontology
The work presented in this master thesis consists of extracting a set of events from texts written in natural language. For this purpose, we have based ourselves on the basic notions of the information extraction as well as the open information extraction. First, we applied an open information extraction(OIE) system for the relationship extraction, to highlight the importance of OIEs in event extraction, and we used the ontology to the event modeling. We tested the results of our approach with test metrics. As a result, the two-level event extraction approach has shown good performance results but requires a lot of expert intervention in the construction of classifiers and this will take time. In this context we have proposed an approach that reduces the expert intervention in the relation extraction, the recognition of entities and the reasoning which are automatic and based on techniques of adaptation and correspondence. Finally, to prove the relevance of the extracted results, we conducted a set of experiments using different test metrics as well as a comparative study.
Text IQ, a machine learning platform for parsing sensitive corporate data, raises $12.6M – TechCrunch
Text IQ, a machine learning system that parses and understands sensitive corporate data, has raised $12.6 million in Series A funding led by FirstMark Capital, with participation from Sierra Ventures. Text IQ started as co-founder Apoorv Agarwal's Columbia thesis project titled "Social Network Extraction From Text." The algorithm he built was able to read a novel, like Jane Austen's "Emma," for example, and understand the social hierarchy and interactions between characters. This people-centric approach to parsing unstructured data eventually became the kernel of Text IQ, which helps corporations find what they're looking for in a sea of unstructured, and highly sensitive, data. The platform started as a tool used by corporate legal teams.
SEntNet: Source-aware Recurrent Entity Network for Dialogue Response Selection
Pei, Jiahuan, Stienstra, Arent, Kiseleva, Julia, de Rijke, Maarten
Dialogue response selection is an important part of Task-oriented Dialogue Systems (TDSs); it aims to predict an appropriate response given a dialogue context. Obtaining key information from a complex, long dialogue context is challenging, especially when different sources of information are available, e.g., the user's utterances, the system's responses, and results retrieved from a knowledge base (KB). Previous work ignores the type of information source and merges sources for response selection. However, accounting for the source type may lead to remarkable differences in the quality of response selection. We propose the Source-aware Recurrent Entity Network (SEntNet), which is aware of different information sources for the response selection process. SEntNet achieves this by employing source-specific memories to exploit differences in the usage of words and syntactic structure from different information sources (user, system, and KB). Experimental results show that SEntNet obtains 91.0% accuracy on the Dialog bAbI dataset, outperforming prior work by 4.7%. On the DSTC2 dataset, SEntNet obtains an accuracy of 41.2%, beating source unaware recurrent entity networks by 2.4%.
uma-pi1/OPIEC-pipeline
OPIEC is an Open Information Extraction (OIE) corpus, consisted of more than 341M triples extracted from the entire English Wikipedia. Each triple from the corpus is consisted of rich meta-data: each token from the subj/obj/rel along with NLP annotations (POS tag, NER tag, ...), provenance sentence along with the dependency parse, original (golden) links from Wikipedia, sentence order, space/time, etc (for more detailed explanation of the meta-data, see here). For more details concerning the construction, analysis and statistics of the corpus, read the AKBC paper "OPIEC: An Open Information Extraction Corpus". To download the data and get additional resources, please visit the project page. For reading the data, please visit the GitHub repository OPIEC.
Hierarchical Decision Making by Generating and Following Natural Language Instructions
Hu, Hengyuan, Yarats, Denis, Gong, Qucheng, Tian, Yuandong, Lewis, Mike
We explore using latent natural language instructions as an expressive and compositional representation of complex actions for hierarchical decision making. Rather than directly selecting micro-actions, our agent first generates a latent plan in natural language, which is then executed by a separate model. We introduce a challenging real-time strategy game environment in which the actions of a large number of units must be coordinated across long time scales. We gather a dataset of 76 thousand pairs of instructions and executions from human play, and train instructor and executor models. Experiments show that models using natural language as a latent variable significantly outperform models that directly imitate human actions. The compositional structure of language proves crucial to its effectiveness for action representation. We also release our code, models and data.
Using Structured Representation and Data: A Hybrid Model for Negation and Sentiment in Customer Service Conversations
Misra, Amita, Bhuiyan, Mansurul, Mahmud, Jalal, Tripathy, Saurabh
Twitter customer service interactions have recently emerged as an effective platform to respond and engage with customers. In this work, we explore the role of negation in customer service interactions, particularly applied to sentiment analysis. We define rules to identify true negation cues and scope more suited to conversational data than existing general review data. Using semantic knowledge and syntactic structure from constituency parse trees, we propose an algorithm for scope detection that performs comparable to state of the art BiLSTM. We further investigate the results of negation scope detection for the sentiment prediction task on customer service conversation data using both a traditional SVM and a Neural Network. We propose an antonym dictionary based method for negation applied to a CNN-LSTM combination model for sentiment analysis. Experimental results show that the antonym-based method outperforms the previous lexicon-based and neural network methods.