Inductive Learning
Semi-supervised Learning with Induced Word Senses for State of the Art Word Sense Disambiguation
Başkaya, Osman, Jurgens, David
Word Sense Disambiguation (WSD) aims to determine the meaning of a word in context, and successful approaches are known to benefit many applications in Natural Language Processing. Although supervised learning has been shown to provide superior WSD performance, current sense-annotated corpora do not contain a sufficient number of instances per word type to train supervised systems for all words. While unsupervised techniques have been proposed to overcome this data sparsity problem, such techniques have not outperformed supervised methods. In this paper, we propose a new approach to building semi-supervised WSD systems that combines a small amount of sense-annotated data with information from Word Sense Induction, a fully-unsupervised technique that automatically learns the different senses of a word based on how it is used. In three experiments, we show how sense induction models may be effectively combined to ultimately produce high-performance semi-supervised WSD systems that exceed the performance of state-of-the-art supervised WSD techniques trained on the same sense-annotated data. We anticipate that our results and released software will also benefit evaluation practices for sense induction systems and those working in low-resource languages by demonstrating how to quickly produce accurate WSD systems with minimal annotation effort.
[Q] ELI5: Why *not* have a royal rumble between all Supervised Learning techniques? • /r/MachineLearning
You asked "which algorithm is the best?", but you answered "which algorithms should newcomers try first?". The first question is entirely problem dependent (and technically so is the second). However, experienced practitioners will generally know what to recommend for the second task at first blush, and would generally agree with your choices (some naive bayes, some linear models). I would also add random forest / boosted trees to that "which to try first" list. However, if I presented you with a timeseries problem all these methods would fall flat on their face without good feature engineering or a model that explicitly captures dependencies over samples. This is why it is problem dependent.
Social Business Spotlight Blog
Gloria Lombardi speaks with software startup advisor Steve Ardire to explore the state of Artificial Intelligence (AI) and its implications for the future of work. In Part 1 of this post, we covered the meaning of AI vs. machine intelligence and how AI will affect the future of work. Now we turn our attention to what's going on in the AI market and what this technology could mean for healthcare in particular. AI technology is developing, fast. "In 2016," Ardire says, "we are already seeing the emergence of applications for human resources, marketing and communications, sales, customer service, market and risk intelligence and more."
Global Distant Supervision for Relation Extraction
Han, Xianpei (Institute of Software, Chinese Academy of Sciences) | Sun, Le (Institute of Software, Chinese Academy of Sciences)
Machine learning approaches to relation extraction are typically supervised and require expensive labeled data. To break the bottleneck of labeled data, a promising approach is to exploit easily obtained indirect supervision knowledge – which we usually refer to as distant supervision (DS). However, traditional DS methods mostly only exploit one specific kind of indirect supervision knowledge – the relations/facts in a given knowledge base, thus often suffer from the problem of lack of supervision. In this paper, we propose a global distant supervision model for relation extraction, which can: 1) compensate the lack of supervision with a wide variety of indirect supervision knowledge; and 2) reduce the uncertainty in DS by performing joint inference across relation instances. Experimental results show that, by exploiting the consistency between relation labels, the consistency between relations and arguments, and the consistency between neighbor instances using Markov logic, our method significantly outperforms traditional DS approaches.
Distant IE by Bootstrapping Using Lists and Document Structure
Bing, Lidong (Carnegie Mellon University) | Ling, Mingyang (Carnegie Mellon University) | Wang, Richard C. (Baidu) | Cohen, William W. (Carnegie Mellon University)
Distant labeling for information extraction (IE) suffers from noisy training data. We describe a way of reducing the noise associated with distant IE by identifying coupling constraints between potential instance labels. As one example of coupling,items in a list are likely to have the same label.A second example of coupling comes from analysis of document structure: in some corpora,sections can be identified such that items in the same section are likely to have the same label. Such sections do not exist in all corpora, but we show that augmenting a large corpus with coupling constraints from even a small, well-structured corpus can improve performance substantially, doubling F1 on one task.
Inferring Interpersonal Relations in Narrative Summaries
Srivastava, Shashank (Carnegie Mellon University) | Chaturvedi, Snigdha (University of Maryland, College Park) | Mitchell, Tom (Carnegie Mellon University)
Characterizing relationships between people is fundamental for the understanding of narratives. In this work, we address the problem of inferring the polarity of relationships between people in narrative summaries. We formulate the problem as a joint structured prediction for each narrative, and present a general model that combines evidence from linguistic and semantic features, as well as features based on the structure of the social community in the text. We additionally provide a clustering-based approach that can exploit regularities in narrative types. e.g., learn an affinity for love-triangles in romantic stories. On a dataset of movie summaries from Wikipedia, our structured models provide more than 30% error-reduction over a competitive baseline that considers pairs of characters in isolation.
Learning with Marginalized Corrupted Features and Labels Together
Li, Yingming (University of Electronic Science and Technology of China) | Yang, Ming (State Universityof New York at Binghamton) | Xu, Zenglin (University of Electronic Science and Technology of China) | Zhang, Zhongfei (Mark) (State Universityof New York at Binghamton)
Tagging has become increasingly important in many real-world applications noticeably including web applications, such as web blogs and resource sharing systems. Despite this importance, tagging methods often face difficult challenges such as limited training samples and incomplete labels, which usually lead to degenerated performance on tag prediction. To improve the generalization performance, in this paper, we propose Regularized Marginalized Cross-View learning (RMCV) by jointly modeling on attribute noise and label noise. In more details, the proposed model constructs infinite training examples with attribute noises from known exponential-family distributions and exploits label noise via marginalized denoising autoencoder. Therefore, the model benefits from its robustness and alleviates the problem of tag sparsity. While RMCV is a general method for learning tagging, in the evaluations we focus on the specific application of multi-label text tagging. Extensive evaluations on three benchmark data sets demonstrate that RMCV outstands with a superior performance in comparison with state-of-the-art methods.
Learning Abductive Reasoning Using Random Examples
Juba, Brendan (Washington University in St. Louis)
We consider a new formulation of abduction in which degrees of "plausibility" of explanations, along with the rules of the domain, are learned from concrete examples (settings of attributes). Our version of abduction thus falls in the " learning to reason " framework of Khardon and Roth. Such approaches enable us to capture a natural notion of "plausibility" in a domain while avoiding the extremely difficult problem of specifying an explicit representation of what is "plausible." We specifically consider the question of which syntactic classes of formulas have efficient algorithms for abduction. We find that the class of k -DNF explanations can be found in polynomial time for any fixed k ; but, we also find evidence that even weak versions of our abduction task are intractable for the usual class of conjunctions . This evidence is provided by a connection to the usual, inductive PAC-learning model proposed by Valiant. We also consider an exception-tolerant variant of abduction. We observe that it is possible for polynomial-time algorithms to tolerate a few adversarially chosen exceptions, again for the class of k -DNF explanations. All of the algorithms we study are particularly simple, and indeed are variants of a rule proposed by Mill.
Aggregating Inter-Sentence Information to Enhance Relation Extraction
Zheng, Hao (Beihang University) | Li, Zhoujun (Beihang University) | Wang, Senzhang (Beihang University) | Yan, Zhao ( Beihang University ) | Zhou, Jianshe ( Capital Normal University )
Previous work for relation extraction from free text is mainly based on intra-sentence information. As relations might be mentioned across sentences, inter-sentence information can be leveraged to improve distantly supervised relation extraction. To effectively exploit inter-sentence information, we propose a ranking based approach, which first learns a scoring function based on a listwise learning-to-rank model and then uses it for multi-label relation extraction. Experimental results verify the effectiveness of our method for aggregating information across sentences. Additionally, to further improve the ranking of high-quality extractions, we propose an effective method to rank relations from different entity pairs. This method can be easily integrated into our overall relation extraction framework, and boosts the precision significantly.
A Semi-Supervised Learning Approach to Why-Question Answering
Oh, Jong-Hoon (National Institute of Information and Communications Technology) | Torisawa, Kentaro (National Institute of Information and Communications Technology) | Hashimoto, Chikara (National Institute of Information and Communications Technology) | Iida, Ryu (National Institute of Information and Communications Technology) | Tanaka, Masahiro (National Institute of Information and Communications Technology) | Kloetzer, Julien (National Institute of Information and Communications Technology)
We propose a semi-supervised learning method for improving why-question answering (why-QA). The key of our method is to generate training data (question-answer pairs) from causal relations in texts such as "[Tsunamis are generated]( effect ) because [the ocean's water mass is displaced by an earthquake]( cause )." A naive method for the generation would be to make a question-answer pair by simply converting the effect part of the causal relations into a why-question, like "Why are tsunamis generated?" from the above example, and using the source text of the causal relations as an answer. However, in our preliminary experiments, this naive method actually failed to improve the why-QA performance. The main reason was that the machine-generated questions were often incomprehensible like "Why does (it) happen?", and that the system suffered from overfitting to the results of our automatic causality recognizer. Hence, we developed a novel method that effectively filters out incomprehensible questions and retrieves from texts answers that are likely to be paraphrases of a given causal relation. Through a series of experiments, we showed that our approach significantly improved the precision of the top answer by 8% over the current state-of-the-art system for Japanese why-QA.