Machine learning approaches to relation extraction are typically supervised and require expensive labeled data. To break the bottleneck of labeled data, a promising approach is to exploit easily obtained indirect supervision knowledge – which we usually refer to as distant supervision (DS). However, traditional DS methods mostly only exploit one specific kind of indirect supervision knowledge – the relations/facts in a given knowledge base, thus often suffer from the problem of lack of supervision. In this paper, we propose a global distant supervision model for relation extraction, which can: 1) compensate the lack of supervision with a wide variety of indirect supervision knowledge; and 2) reduce the uncertainty in DS by performing joint inference across relation instances. Experimental results show that, by exploiting the consistency between relation labels, the consistency between relations and arguments, and the consistency between neighbor instances using Markov logic, our method significantly outperforms traditional DS approaches.
Aggarwal, Apoorv (Indian Institute of Technology Bombay) | Ghoshal, Sandip (Indian Institute of Technology Bombay) | Shetty, Ankith M. S. (Indian Institute of Technology Bombay) | Sinha, Suhit (Indian Institute of Technology Bombay) | Ramakrishnan, Ganesh (Indian Institute of Technology Bombay) | Kar, Purushottam (Indian Institute of Technology Kanpur) | Jain, Prateek (Microsoft Research )
The problem of multi-instance multi-label learning (MIML) requires a bag of instances to be assigned a set of labels most relevant to the bag as a whole. The problem finds numerous applications in machine learning, computer vision, and natural language processing settings where only partial or distant supervision is available. We present a novel method for optimizing multivariate performance measures in the MIML setting. Our approach MIML-perf uses a novel plug-in technique and offers a seamless way to optimize a vast variety of performance measures such as macro and micro-F measure, average precision, which are performance measures of choice in multi-label learning domains. MIML-perf offers two key benefits over the state of the art. Firstly, across a diverse range of benchmark tasks, ranging from relation extraction to text categorization and scene classification, MIML-perf offers superior performance as compared to state of the art methods designed specifically for these tasks. Secondly, MIML-perf operates with significantly reduced running times as compared to other methods, often by an order of magnitude or more.
Natarajan, Sriraam (Wake Forest University) | Picado, Jose (Wake Forest University) | Khot, Tushar (University of Wisconsin-Madison) | Kersting, Kristian (University of Bonn) | Re, Cristopher (University of Wisconsin-Madison) | Shavlik, Jude (University of Wisconsin-Madison)
One of the challenges to information extraction is the requirement of human annotated examples. Current successful approaches alleviate this problem by employing some form of distant supervision i.e., look into knowledge bases such as Freebase as a source of supervision to create more examples. While this is perfectly reasonable, most distant supervision methods rely on a hand coded background knowledge that explicitly looks for patterns in text. In this work, we take a different approach -- we create weakly supervised examples for relations by using commonsense knowledge. The key innovation is that this commonsense knowledge is completely independent of the natural language text. This helps when learning the full model for information extraction as against simply learning the parameters of a known CRF or MLN. We demonstrate on two domains that this form of weak supervision yields superior results when learning structure compared to simply using the gold standard labels.
We introduce a new entity typing task: given a sentence with an entity mention, the goal is to predict a set of free-form phrases (e.g. skyscraper, songwriter, or criminal) that describe appropriate types for the target entity. This formulation allows us to use a new type of distant supervision at large scale: head words, which indicate the type of the noun phrases they appear in. We show that these ultra-fine types can be crowd-sourced, and introduce new evaluation sets that are much more diverse and fine-grained than existing benchmarks. We present a model that can predict open types, and is trained using a multitask objective that pools our new head-word supervision with prior supervision from entity linking. Experimental results demonstrate that our model is effective in predicting entity types at varying granularity; it achieves state of the art performance on an existing fine-grained entity typing benchmark, and sets baselines for our newly-introduced datasets. Our data and model can be downloaded from: http://nlp.cs.washington.edu/entity_type
Labeling training data is increasingly the largest bottleneck in deploying machine learning systems. We present Snorkel, a first-of-its-kind system that enables users to train state-of-the-art models without hand labeling any training data. Instead, users write labeling functions that express arbitrary heuristics, which can have unknown accuracies and correlations. Snorkel denoises their outputs without access to ground truth by incorporating the first end-to-end implementation of our recently proposed machine learning paradigm, data programming. We present a flexible interface layer for writing labeling functions based on our experience over the past year collaborating with companies, agencies, and research labs. In a user study, subject matter experts build models 2.8x faster and increase predictive performance an average 45.5% versus seven hours of hand labeling. We study the modeling tradeoffs in this new setting and propose an optimizer for automating tradeoff decisions that gives up to 1.8x speedup per pipeline execution. In two collaborations, with the U.S. Department of Veterans Affairs and the U.S. Food and Drug Administration, and on four open-source text and image data sets representative of other deployments, Snorkel provides 132% average improvements to predictive performance over prior heuristic approaches and comes within an average 3.60% of the predictive performance of large hand-curated training sets.