Statistical Learning
A Probabilistic Covariate Shift Assumption for Domain Adaptation
Adel, Tameem (University of Waterloo. Radboud University.) | Wong, Alexander (University of Waterloo.)
The aim of domain adaptation algorithms is to establish a learner, trained on labeled data from a source domain, that can classify samples from a target domain, in which few or no labeled data are available for training. Covariate shift, a primary assumption in several works on domain adaptation, assumes that the labeling functions of source and target domains are identical. We present a domain adaptation algorithm that assumes a relaxed version of covariate shift where the assumption that the labeling functions of the source and target domains are identical holds with a certain probability. Assuming a source deterministic large margin binary classifier, the farther a target instance is from the source decision boundary, the higher the probability that covariate shift holds. In this context, given a target unlabeled sample and no target labeled data, we develop a domain adaptation algorithm that bases its labeling decisions both on the source learner and on the similarities between the target unlabeled instances. The source labeling function decisions associated with probabilistic covariate shift, along with the target similarities are concurrently expressed on a similarity graph. We evaluate our proposed algorithm on a benchmark sentiment analysis (and domain adaptation) dataset, where state-of-the-art adaptation results are achieved. We also derive a lower bound on the performance of the algorithm.
Topical Word Embeddings
Liu, Yang (Tsinghua University) | Liu, Zhiyuan (Tsinghua University) | Chua, Tat-Seng (National University of Singapore) | Sun, Maosong (Tsinghua University)
Most word embedding models typically represent each word using a single vector, which makes these models indiscriminative for ubiquitous homonymy and polysemy. In order to enhance discriminativeness, we employ latent topic models to assign topics for each word in the text corpus, and learn topical word embeddings (TWE) based on both words and their topics. In this way, contextual word embeddings can be flexibly obtained to measure contextual word similarity. We can also build document representations, which are more expressive than some widely-used document models such as latent topic models. In the experiments, we evaluate the TWE models on two tasks, contextual word similarity and text classification. The experimental results show that our models outperform typical word embedding models including the multi-prototype version on contextual word similarity, and also exceed latent topic models and other representative document models on text classification.
Gazetteer-Independent Toponym Resolution Using Geographic Word Profiles
DeLozier, Grant (The University of Texas at Austin) | Baldridge, Jason (The University of Texas at Austin) | London, Loretta (The University of Texas at Austin)
Toponym resolution, or grounding names of places to their actual locations, is an important problem in analysis of both historical corpora and present-day news and web content. Recent approaches have shifted from rule-based spatial minimization methods to machine learned classifiers that use features of the text surrounding a toponym. Such methods have been shown to be highly effective, but they crucially rely on gazetteers and are unable to handle unknown place names or locations. We address this limitation by modeling the geographic distributions of words over the earth's surface: we calculate the geographic profile of each word based on local spatial statistics over a set of geo-referenced language models. These geo-profiles can be further refined by combining in-domain data with background statistics from Wikipedia. Our resolver computes the overlap of all geo-profiles in a given text span; without using a gazetteer, it performs on par with existing classifiers. When combined with a gazetteer, it achieves state-of-the-art performance for two standard toponym resolution corpora (TR-CoNLL and Civil War). Furthermore, it dramatically improves recall when toponyms are identified by named entity recognizers, which often (correctly) find non-standard variants of toponyms.
Target-Dependent Churn Classification in Microblogs
Amiri, Hadi (University of Maryland) | III, Hal Daume (University of Maryland)
In particular, we investigate demographic business. Banks, telecommunication companies, airlines, Internet churn indicators (obtained from users of microposts), service providers, pay TV companies, and insurance content churn indicators (obtained from the textual firms etc., utilize customer churn or attrition rates as one of content of micro-posts), and context churn indicators (obtained their key business metrics. This metric is important as the from threads containing the micro-posts). We examine churn rate of a business is a good indicator of customer response factors that make this problem more challenging and investigate to services, pricing, and competitions. The ability to the performance of several state-of-the-art machine identify churny contents / behaviors can enable early intervention learning techniques on this problem. A challenging aspect processes (as part of retention campaigns) and ultimately of such classification task is that churny contents can be expressed a reduction in customer churn.
Ordering-Sensitive and Semantic-Aware Topic Modeling
Yang, Min (The University of Hong Kong) | Cui, Tianyi (Zhejiang University) | Tu, Wenting (The University of Hong Kong)
Topic modeling of textual corpora is an important and challenging problem. In most previous work, the โbag-of-wordsโ assumption is usually made which ignores the ordering of words. This assumption simplifies the computation, but it unrealistically loses the ordering information and the semantic of words in the context. In this paper, we present a Gaussian Mixture Neural Topic Model (GMNTM) which incorporates both the ordering of words and the semantic meaning of sentences into topic modeling. Specifically, we represent each topic as a cluster of multi-dimensional vectors and embed the corpus into a collection of vectors generated by the Gaussian mixture model. Each word is affected not only by its topic, but also by the embedding vector of its surrounding words and the context. The Gaussian mixture components and the topic of documents, sentences and words can be learnt jointly. Extensive experiments show that our model can learn better topics and more accurate word distributions for each topic. Quantitatively, comparing to state-of-the-art topic modeling approaches, GMNTM obtains significantly better performance in terms of perplexity, retrieval accuracy and classification accuracy.
Contrastive Unsupervised Word Alignment with Non-Local Features
Liu, Yang (Tsinghua University) | Sun, Maosong (Tsinghua University)
Word alignment is an important natural language processing task that indicates the correspondence between natural languages. Recently, unsupervised learning of log-linear models for word alignment has received considerable attention as it combines the merits of generative and discriminative approaches. However, a major challenge still remains: it is intractable to calculate the expectations of non-local features that are critical for capturing the divergence between natural languages. We propose a contrastive approach that aims to differentiate observed training examples from noises. It not only introduces prior knowledge to guide unsupervised learning but also cancels out partition functions. Based on the observation that the probability mass of log-linear models for word alignment is usually highly concentrated, we propose to use top-$n$ alignments to approximate the expectations with respect to posterior distributions. This allows for efficient and accurate calculation of expectations of non-local features. Experiments show that our approach achieves significant improvements over state-of-the-art unsupervised word alignment methods.
Joint Anaphoricity Detection and Coreference Resolution with Constrained Latent Structures
Lassalle, Emmanuel (Universitรฉ Paris-Diderot) | Denis, Pascal (INRIA)
This paper introduces a new structured model for learning anaphoricity detection and coreference resolution in a joint fashion. Specifically,we use a latent tree to represent the full coreference and anaphoric structure of a document at a global level, and we jointly learn the parameters of the two models using a version of the structured perceptron algorithm. Our joint structured model is further refined by the use of pairwise constraints which help the model to capture accurately certain patterns of coreference. Our experiments on the CoNLL-2012 English datasets show large improvements in both coreference resolution and anaphoricity detection, compared to various competing architectures. Our best coreference system obtains a CoNLL score of 81.97 on gold mentions, which is to date the best score reported on this setting.
A Stratified Strategy for Efficient Kernel-Based Learning
Filice, Simone (University of Roma Tor Vergata) | Croce, Danilo (University of Roma Tor Vergata) | Basili, Roberto (University of Roma Tor Vergata)
In Kernel-based Learning the targeted phenomenon is summarized by a set of explanatory examples derived from the training set. When the model size grows with the complexity of the task, such approaches are so computationally demanding that the adoption of comprehensive models is not always viable.In this paper, a general framework aimed at minimizing this problem is proposed: multiple classifiers are stratified and dynamically invoked according to increasing levels of complexity corresponding to incrementally more expressive representation spaces.Computationally expensive inferences are thus adopted only when the classification at lower levels is too uncertain over an individual instance. The application of complex functions is thus avoided where possible, with a significant reduction of the overall costs. The proposed strategy has been integrated within two well-known algorithms: Support Vector Machines and Passive-Aggressive Online classifier.A significant cost reduction (up to 90%), with a negligible performance drop, is observed against two Natural Language Processing tasks, i.e. Question Classification and Sentiment Analysis in Twitter.
Dataless Text Classification with Descriptive LDA
Chen, Xingyuan (Leshan Normal University) | Xia, Yunqing (Tsinghua University) | Jin, Peng (Leshan Normal University) | Carroll, John (University of Sussex)
Manually labeling documents for training a text classifier is expensive and time-consuming. Moreover, a classifier trained on labeled documents may suffer from overfitting and adaptability problems. Dataless text classification (DLTC) has been proposed as a solution to these problems, since it does not require labeled documents. Previous research in DLTC has used explicit semantic analysis of Wikipedia content to measure semantic distance between documents, which is in turn used to classify test documents based on nearest neighbours. The semantic-based DLTC method has a major drawback in that it relies on a large-scale, finely-compiled semantic knowledge base, which is difficult to obtain in many scenarios. In this paper we propose a novel kind of model, descriptive LDA (DescLDA), which performs DLTC with only category description words and unlabeled documents. In DescLDA, the LDA model is assembled with a describing device to infer Dirichlet priors from prior descriptive documents created with category description words. The Dirichlet priors are then used by LDA to induce category-aware latent topics from unlabeled documents. Experimental results with the 20Newsgroups and RCV1 datasets show that: (1) our DLTC method is more effective than the semantic-based DLTC baseline method; and (2) the accuracy of our DLTC method is very close to state-of-the-art supervised text classification methods. As neither external knowledge resources nor labeled documents are required, our DLTC method is applicable to a wider range of scenarios.
Predicting Peer-to-Peer Loan Rates Using Bayesian Non-Linear Regression
Bitvai, Zsolt (University of Sheffield) | Cohn, Trevor (University of Melbourne)
Peer-to-peer lending is a new highly liquid market for debt, which is rapidly growing in popularity. Here we consider modelling market rates, developing a non-linear Gaussian Process regression method which incorporates both structured data and unstructured text from the loan application. We show that the peer-to-peer market is predictable, and identify a small set of key factors with high predictive power. Our approach outperforms baseline methods for predicting market rates, and generates substantial profit in a trading simulation.