Grammars & Parsing
Generalized Earley Parser: Bridging Symbolic Grammars and Sequence Data for Future Prediction
Qi, Siyuan, Jia, Baoxiong, Zhu, Song-Chun
Future predictions on sequence data (e.g., videos or audios) require the algorithms to capture non-Markovian and compositional properties of high-level semantics. Context-free grammars are natural choices to capture such properties, but traditional grammar parsers (e.g., Earley parser) only take symbolic sentences as inputs. In this paper, we generalize the Earley parser to parse sequence data which is neither segmented nor labeled. This generalized Earley parser integrates a grammar parser with a classifier to find the optimal segmentation and labels, and makes top-down future predictions. Experiments show that our method significantly outperforms other approaches for future human activity prediction.
Developing NLP Applications Using NLTK in Python
Have you ever faced challenges in understanding language and planning sentences while performing Natural Language Processing? Do you wish to overcome these problems and go beyond the basic techniques like bag-of-words? This course is designed with advanced solutions that will take you from newbie to pro in performing Natural Language Processing with NLTK. In this course, you will come across various concepts covering natural language understanding, Natural Language Processing, and syntactic analysis. It consists of everything you need to efficiently use NLTK to implement text classification, identify parts of speech, tag words, and more.
Temporal Event Knowledge Acquisition via Identifying Narratives
Inspired by the double temporality characteristic of narrative texts, we propose a novel approach for acquiring rich temporal "before/after" event knowledge across sentences in narrative stories. The double temporality states that a narrative story often describes a sequence of events following the chronological order and therefore, the temporal order of events matches with their textual order. We explored narratology principles and built a weakly supervised approach that identifies 287k narrative paragraphs from three large text corpora. We then extracted rich temporal event knowledge from these narrative paragraphs. Such event knowledge is shown useful to improve temporal relation classification and outperform several recent neural network models on the narrative cloze task.
Mining Procedures from Technical Support Documents
Gupta, Abhirut, Khosla, Abhay, Singh, Gautam, Dasgupta, Gargi
Guided troubleshooting is an inherent task in the domain of technical support services. When a customer experiences an issue with the functioning of a technical service or a product, an expert user helps guide the customer through a set of steps comprising a troubleshooting procedure. The objective is to identify the source of the problem through a set of diagnostic steps and observations, and arrive at a resolution. Procedures containing these set of diagnostic steps and observations in response to different problems are common artifacts in the body of technical support documentation. The ability to use machine learning and linguistics to understand and leverage these procedures for applications like intelligent chatbots or robotic process automation, is crucial. Existing research on question answering or intelligent chatbots does not look within procedures or deep-understand them. In this paper, we outline a system for mining procedures from technical support documents. We create models for solving important subproblems like extraction of procedures, identifying decision points within procedures, identifying blocks of instructions corresponding to these decision points and mapping instructions within a decision block. We also release a dataset containing our manual annotations on publicly available support documents, to promote further research on the problem.
Semi-supervised classification by reaching consensus among modalities
Zhu, Zining, Novikova, Jekaterina, Rudzicz, Frank
This paper introduces transductive consensus network (TCNs), as an extension of a consensus network (CN), for semi-supervised learning. TCN does multi-modal classification based on a few available labels by urging the {\em interpretations} of different modalities to resemble each other. We formulate the multi-modal, semi-supervised learning problem, put forward TCN for multi-modal semi-supervised learning task, and its several variants. To understand the mechanisms of TCN, we formulate the {\em similarity} of the interpretations as the negative relative Jensen-Shannon divergence, and show that a consensus state beneficial for classification desires a stable but not perfect similarity between the interpretations. We show the performances of TCN are better than best benchmark algorithms given only 20 and 80 labeled samples on Bank Marketing and the DementiaBank dataset respectively, and align with their performances given more labeled samples.
Generative Code Modeling with Graphs
Brockschmidt, Marc, Allamanis, Miltiadis, Gaunt, Alexander L., Polozov, Oleksandr
Generative models for source code are an interesting structured prediction problem, requiring to reason about both hard syntactic and semantic constraints as well as about natural, likely programs. We present a novel model for this problem that uses a graph to represent the intermediate state of the generated output. The generative procedure interleaves grammar-driven expansion steps with graph augmentation and neural message passing steps. An experimental evaluation shows that our new model can generate semantically meaningful expressions, outperforming a range of strong baselines.
SkeletonScore: Guiding a Semantic Parser to Better Results by Example
Bose, Ritwik (University of Rochester) | Allen, James (University of Rochester)
The task of semantic parsing is to map natural-language sentences to logical forms representing the underlying meanings of those sentences. Typically, resolving semantic ambiguity is considered to be a side effect of semantic parsing. However a large number of errors in parsing can be attributed to incorrect sense disambiguation in the first place. This can arise from the selection of an incorrect semantic role or semantic type by the parser. This paper applies a knowledge-based algorithm to guide a semantic parser to simultaneously select better semantic types and roles. The algorithm takes into account semantic roles and ontology types to reduce restriction violations arising from incorrect semantic role or type choices, hence increasing the total accuracy of the semantic parser.
Large-Scale QA-SRL Parsing
FitzGerald, Nicholas, Michael, Julian, He, Luheng, Zettlemoyer, Luke
We present a new large-scale corpus of Question-Answer driven Semantic Role Labeling (QA-SRL) annotations, and the first high-quality QA-SRL parser. Our corpus, QA-SRL Bank 2.0, consists of over 250,000 question-answer pairs for over 64,000 sentences across 3 domains and was gathered with a new crowd-sourcing scheme that we show has high precision and good recall at modest cost. We also present neural models for two QA-SRL subtasks: detecting argument spans for a predicate and generating questions to label the semantic relationship. The best models achieve question accuracy of 82.6% and span-level accuracy of 77.6% (under human evaluation) on the full pipelined QA-SRL prediction task. They can also, as we show, be used to gather additional annotations at low cost.
CLAUDETTE: an Automated Detector of Potentially Unfair Clauses in Online Terms of Service
Lippi, Marco, Palka, Przemyslaw, Contissa, Giuseppe, Lagioia, Francesca, Micklitz, Hans-Wolfgang, Sartor, Giovanni, Torroni, Paolo
For instance, consumer protection agencies and/or consumer organisations may be involved to a different degree, there may or may not be fines for using unfair contractual terms, etc. (Schulte-Nölke et al 2008). One thing that all member states have in common is that if a business uses unfair terms in their contracts, in principle there is always a competent party with the authority to challenge such contracts. Unfortunately, the legal mechanism for enforcing the prohibition of unfair contract terms have failed to effectively counter this practice so far. As reported by some literature (Loos and Luzak 2016), and as our own research indicates (Micklitz et al 2017), unfair contractual terms are, as of today, widely used in ToS of online platforms. In our previous research (Micklitz et al 2017), we developed a theoretical model of tasks that human lawyers currently need to carry out, before starting the legal proceedings concerning the abstract control of fairness of clauses.