Asia
Never-Ending Learning
Mitchell, Tom M. (Carnegie Mellon University) | Cohen, William (Carnegie Mellon University) | Hruschka, Estevam (University of Sao Carlos) | Talukdar, Partha (Indian Institute of Science) | Betteridge, Justin (Carnegie Mellon University) | Carlson, Andrew (Google) | Mishra, Bhavana Dalvi (Carnegien Mellon University) | Gardner, Matthew (Carnegie Mellon University) | Kisiel, Bryan (Carnegie Mellon University) | Krishnamurthy, Jayant (Carnegie Mellon University) | Lao, Ni (Google) | Mazaitis, Kathryn (Carnegie Mellon University) | Mohamed, Thahir (Carnegie Mellon University) | Nakashole, Ndapa (Carnegie Mellon University) | Platanios, Emmanouil Antonios (Ohioe State University) | Ritter, Alan (Carnegie Mellon University) | Samadi, Mehdi (Duolingo) | Settles, Burr (Carnegie Mellon University) | Wang, Richard (Carnegie Mellon University) | Wijaya, Derry (Carnegie Mellon University) | Gupta, Abhinav (Carnegie Mellon University) | Chen, Xinlei (Alpine Data Lab) | Saparov, Abulhair (Pittsburgh Supercomputer Center) | Greaves, Malcolm | Welling, Joel
Whereas people learn many different types of knowledge from diverse experiences over many years, most current machine learning systems acquire just a single function or data model from just a single data set. We propose a never-ending learning paradigm for machine learning, to better reflect the more ambitious and encompassing type of learning performed by humans. As a case study, we describe the Never-Ending Language Learner (NELL), which achieves some of the desired properties of a never-ending learner, and we discuss lessons learned. NELL has been learning to read the web 24 hours/day since January 2010, and so far has acquired a knowledge base with over 80 million confidence-weighted beliefs (e.g., servedWith(tea, biscuits) ). NELL has also learned millions of features and parameters that enable it to read these beliefs from the web. Additionally, it has learned to reason over these beliefs to infer new beliefs, and is able to extend its ontology by synthesizing new relational predicates. NELL can be tracked online at http://rtw.ml.cmu.edu, and followed on Twitter at @CMUNELL.
Contrastive Unsupervised Word Alignment with Non-Local Features
Liu, Yang (Tsinghua University) | Sun, Maosong (Tsinghua University)
Word alignment is an important natural language processing task that indicates the correspondence between natural languages. Recently, unsupervised learning of log-linear models for word alignment has received considerable attention as it combines the merits of generative and discriminative approaches. However, a major challenge still remains: it is intractable to calculate the expectations of non-local features that are critical for capturing the divergence between natural languages. We propose a contrastive approach that aims to differentiate observed training examples from noises. It not only introduces prior knowledge to guide unsupervised learning but also cancels out partition functions. Based on the observation that the probability mass of log-linear models for word alignment is usually highly concentrated, we propose to use top-$n$ alignments to approximate the expectations with respect to posterior distributions. This allows for efficient and accurate calculation of expectations of non-local features. Experiments show that our approach achieves significant improvements over state-of-the-art unsupervised word alignment methods.
Learning to Mediate Perceptual Differences in Situated Human-Robot Dialogue
Liu, Changsong (Michigan State University) | Chai, Joyce Yue (Michigan State University)
In human-robot dialogue, although a robot and its human partner are co-present in a shared environment, they have significantly mismatched perceptual capabilities (e.g., recognizing objects in the surroundings). When a shared perceptual basis is missing, it becomes difficult for the robot to identify referents in the physical world that are referred to by the human (i.e., a problem of referential grounding). To overcome this problem, we have developed an optimization based approach that allows the robot to detect and adapt to perceptual differences. Through online interaction with the human, the robot can learn a set of weights indicating how reliably/unreliably each dimension (e.g., object type, object color, etc.) of its perception of the environment maps to the human's linguistic descriptors and thus adjust its word models accordingly. Our empirical evaluation has shown that this weight-learning approach can successfully adjust the weights to reflect the robot's perceptual limitations. The learned weights, together with updated word models, can lead to a significant improvement for referential grounding in future dialogues.
Recurrent Convolutional Neural Networks for Text Classification
Lai, Siwei (Chinese Academy of Sciences) | Xu, Liheng (Chinese Academy of Sciences) | Liu, Kang (Chinese Academy of Sciences) | Zhao, Jun (Chinese Academy of Sciences)
Text classification is a foundational task in many NLP applications. Traditional text classifiers often rely on many human-designed features, such as dictionaries, knowledge bases and special tree kernels. In contrast to traditional methods, we introduce a recurrent convolutional neural network for text classification without human-designed features. In our model, we apply a recurrent structure to capture contextual information as far as possible when learning word representations, which may introduce considerably less noise compared to traditional window-based neural networks. We also employ a max-pooling layer that automatically judges which words play key roles in text classification to capture the key components in texts. We conduct experiments on four commonly used datasets. The experimental results show that the proposed method outperforms the state-of-the-art methods on several datasets, particularly on document-level datasets.
Dataless Text Classification with Descriptive LDA
Chen, Xingyuan (Leshan Normal University) | Xia, Yunqing (Tsinghua University) | Jin, Peng (Leshan Normal University) | Carroll, John (University of Sussex)
Manually labeling documents for training a text classifier is expensive and time-consuming. Moreover, a classifier trained on labeled documents may suffer from overfitting and adaptability problems. Dataless text classification (DLTC) has been proposed as a solution to these problems, since it does not require labeled documents. Previous research in DLTC has used explicit semantic analysis of Wikipedia content to measure semantic distance between documents, which is in turn used to classify test documents based on nearest neighbours. The semantic-based DLTC method has a major drawback in that it relies on a large-scale, finely-compiled semantic knowledge base, which is difficult to obtain in many scenarios. In this paper we propose a novel kind of model, descriptive LDA (DescLDA), which performs DLTC with only category description words and unlabeled documents. In DescLDA, the LDA model is assembled with a describing device to infer Dirichlet priors from prior descriptive documents created with category description words. The Dirichlet priors are then used by LDA to induce category-aware latent topics from unlabeled documents. Experimental results with the 20Newsgroups and RCV1 datasets show that: (1) our DLTC method is more effective than the semantic-based DLTC baseline method; and (2) the accuracy of our DLTC method is very close to state-of-the-art supervised text classification methods. As neither external knowledge resources nor labeled documents are required, our DLTC method is applicable to a wider range of scenarios.
Unsupervised Word Sense Disambiguation Using Markov Random Field and Dependency Parser
Chaplot, Devendra Singh (Samsung Electronics Co., Ltd.) | Bhattacharyya, Pushpak (IIT Bombay) | Paranjape, Ashwin (Stanford University)
Word Sense Disambiguation is a difficult problem to solve in the unsupervised setting. This is because in this setting inference becomes more dependent on the interplay between different senses in the context due to unavailability of learning resources. Using two basic ideas, sense dependency and selective dependency, we model the WSD problem as a Maximum A Posteriori (MAP) Inference Query on a Markov Random Field (MRF) built using WordNet and Link Parser or Stanford Parser. To the best of our knowledge this combination of dependency and MRF is novel, and our graph-based unsupervised WSD system beats state-of-the-art system on SensEval-2, SensEval-3 and SemEval-2007 English all-words datasets while being over 35 times faster.
A Novel Neural Topic Model and Its Supervised Extension
Cao, Ziqiang (Peking University) | Li, Sujian (Peking University) | Liu, Yang (Peking University) | Li, Wenjie (Hong Kong Polytechnic University) | Ji, Heng (Rensselaer Polytechnic Institute)
Topic modeling techniques have the benefits of modeling words and documents uniformly under a probabilistic framework. However, they also suffer from the limitations of sensitivity to initialization and unigram topic distribution, which can be remedied by deep learning techniques. To explore the combination of topic modeling and deep learning techniques, we first explain the standard topic modelfrom the perspective of a neural network. Based on this, we propose a novel neural topic model (NTM) where the representation of words and documents are efficiently and naturally combined into a uniform framework. Extending from NTM, we can easily add a label layer and propose the supervised neural topic model (sNTM) to tackle supervised tasks. Experiments show that our models are competitive in both topic discovery and classification/regression tasks.
Predicting Peer-to-Peer Loan Rates Using Bayesian Non-Linear Regression
Bitvai, Zsolt (University of Sheffield) | Cohn, Trevor (University of Melbourne)
Peer-to-peer lending is a new highly liquid market for debt, which is rapidly growing in popularity. Here we consider modelling market rates, developing a non-linear Gaussian Process regression method which incorporates both structured data and unstructured text from the loan application. We show that the peer-to-peer market is predictable, and identify a small set of key factors with high predictive power. Our approach outperforms baseline methods for predicting market rates, and generates substantial profit in a trading simulation.
Learning Entity and Relation Embeddings for Knowledge Graph Completion
Lin, Yankai (Tsinghua University) | Liu, Zhiyuan (Tsinghua University) | Sun, Maosong (Tsinghua University) | Liu, Yang (Samsung Research and Development Institute of China) | Zhu, Xuan (Samsung Research and Development Institute of China)
Knowledge graph completion aims to perform link prediction between entities. In this paper, we consider the approach of knowledge graph embeddings. Recently, models such as TransE and TransH build entity and relation embeddings by regarding a relation as translation from head entity to tail entity. We note that these models simply put both entities and relations within the same semantic space. In fact, an entity may have multiple aspects and various relations may focus on different aspects of entities, which makes a common space insufficient for modeling. In this paper, we propose TransR to build entity and relation embeddings in separate entity space and relation spaces. Afterwards, we learn embeddings by first projecting entities from entity space to corresponding relation space and then building translations between projected entities. In experiments, we evaluate our models on three tasks including link prediction, triple classification and relational fact extraction. Experimental results show significant and consistent improvements compared to state-of-the-art baselines including TransE and TransH.
Automatically Creating a Large Number of New Bilingual Dictionaries
Lam, Khang Nhut (University of Colorado, Colorado Springs) | Tarouti, Feras Al (University of Colorado, Colorado Springs) | Kalita, Jugal (University of Colorado, Colorado Springs)
This paper proposes approaches to automatically createa large number of new bilingual dictionaries for low resource languages, especially resource-poor and endangered languages, from a single input bilingual dictionary. Our algorithms produce translations of wordsin a source language to plentiful target languages using available Wordnets and a machine translator (MT). Since our approaches rely on just one input dictionary, available Wordnets and an MT, they are applicable toany bilingual dictionary as long as one of the two languagesis English or has a Wordnet linked to the Princeton Wordnet. Starting with 5 available bilingual dictionaries,we create 48 new bilingual dictionaries. Of these, 30 pairs of languages are not supported by the popular MTs: Google and Bing.