Goto

Collaborating Authors

 Text Classification


Learning Multiple Tasks in Parallel with a Shared Annotator

Neural Information Processing Systems

We introduce a new multi-task framework, in which $K$ online learners are sharing a single annotator with limited bandwidth. On each round, each of the $K$ learners receives an input, and makes a prediction about the label of that input. Then, a shared (stochastic) mechanism decides which of the $K$ inputs will be annotated. The learner that receives the feedback (label) may update its prediction rule, and we proceed to the next round. We develop an online algorithm for multi-task binary classification that learns in this setting, and bound its performance in the worst-case setting. Additionally, we show that our algorithm can be used to solve two bandits problems: contextual bandits, and dueling bandits with context, both allowed to decouple exploration and exploitation. Empirical study with OCR data, vowel prediction (VJ project) and document classification, shows that our algorithm outperforms other algorithms, one of which uses uniform allocation, and essentially makes more (accuracy) for the same labour of the annotator.


A Multiplicative Model for Learning Distributed Text-Based Attribute Representations

Neural Information Processing Systems

In this paper we propose a general framework for learning distributed representations of attributes: characteristics of text whose representations can be jointly learned with word embeddings. Attributes can correspond to a wide variety of concepts, such as document indicators (to learn sentence vectors), language indicators (to learn distributed language representations), meta-data and side information (such as the age, gender and industry of a blogger) or representations of authors. We describe a third-order model where word context and attribute vectors interact multiplicatively to predict the next word in a sequence. This leads to the notion of conditional word similarity: how meanings of words change when conditioned on different attributes. We perform several experimental tasks including sentiment classification, cross-lingual document classification, and blog authorship attribution. We also qualitatively evaluate conditional word neighbours and attribute-conditioned text generation.


Domain-Specific Sentiment Classification for Games-Related Tweets

AAAI Conferences

Sentiment classification provides information about the author's feeling toward a topic through the use of expressive words. However, words indicative of a particular sentiment class can be domain-specific. We train a text classifier for Twitter data related to games using labels inferred from emoticons. Our classifier is able to differentiate between positive and negative sentiment tweets labeled by emoticons with 75.1% accuracy. Additionally, we test the classifier on human-labeled examples with the additional case of neutral or ambiguous sentiment. Finally, we have made the data available to the community for further use and analysis.


On Dataless Hierarchical Text Classification

AAAI Conferences

In this paper, we systematically study the problem of dataless hierarchical text classification. Unlike standard text classification schemes that rely on supervised training, dataless classification depends on understanding the labels of the sought after categories and requires no labeled data. Given a collection of text documents and a set of labels, we show that understanding the labels can be used to accurately categorize the documents. This is done by embedding both labels and documents in a semantic space that allows one to compute meaningful semantic similarity between a document and a potential label. We show that this scheme can be used to support accurate multiclass classification without any supervision. We study several semantic representations and show how to improve the classification using bootstrapping. Our results show that bootstrapped dataless classification is competitive with supervised classification with thousands of labeled examples.


Automated Classification of Stance in Student Essays: An Approach Using Stance Target Information and the Wikipedia Link-Based Measure

AAAI Conferences

We present a new approach to the automated classification of document-level argument stance, a relatively under-researched sub-task of Sentiment Analysis. In place of the noisy online debate data currently used in stance classification research, a corpus of student essays annotated for essay-level stance is constructed for use in a series of classification experiments. A novel set of features designed to capture the stance, stance targets, and topical relationships between the essay prompt and the student's essay is described. Models trained on this feature set showed significant increases in accuracy relative to two high baselines.


Documents as multiple overlapping windows into grids of counts

Neural Information Processing Systems

In text analysis documents are represented as disorganized bags of words, models of count features are typically based on mixing a small number of topics \cite{lda,sam}. Recently, it has been observed that for many text corpora documents evolve into one another in a smooth way, with some features dropping and new ones being introduced. The counting grid \cite{cgUai} models this spatial metaphor literally: it is multidimensional grid of word distributions learned in such a way that a document's own distribution of features can be modeled as the sum of the histograms found in a window into the grid. The major drawback of this method is that it is essentially a mixture and all the content much be generated by a single contiguous area on the grid. This may be problematic especially for lower dimensional grids. In this paper, we overcome to this issue with the \emph{Componential Counting Grid} which brings the componential nature of topic models to the basic counting grid. We also introduce a generative kernel based on the document's grid usage and a visualization strategy useful for understanding large text corpora. We evaluate our approach on document classification and multimodal retrieval obtaining state of the art results on standard benchmarks.


Active Learning for Cross-domain Sentiment Classification

AAAI Conferences

In the literature, various approaches have been proposedto address the domain adaptation problem in sentiment classification (also called cross-domainsentiment classification). However, the adaptation performance normally much suffers when the data distributionsin the source and target domains differ significantly. In this paper, we suggest to perform activelearning for cross-domain sentiment classification by actively selecting a smallamount of labeled data in the target domain. Accordingly, we propose an novel activelearning approach for cross-domain sentiment classification. First, we traintwo individual classifiers, i.e., the source and target classifiers with thelabeled data from the source and target respectively. Then, the two classifiersare employed to select informative samples with the selection strategy of QueryBy Committee (QBC). Third, the two classifier is combined to make theclassification decision. Importantly, the two classifiers are trained by fullyexploiting the unlabeled data in the target domain with the label propagation(LP) algorithm. Empirical studies demonstrate the effectiveness of our active learning approach for cross-domainsentiment classification over some strong baselines.


Concept Learning for Cross-Domain Text Classification: A General Probabilistic Framework

AAAI Conferences

Cross-domain learning targets at leveraging the knowledge from source domains to train accurate models for the test data from target domains with different but related data distributions. To tackle the challenge of data distribution difference in terms of raw features, previous works proposed to mine high-level concepts (e.g., word clusters) across data domains, which shows to be more appropriate for classification. However, all these works assume that the same set of concepts are shared in the source and target domains in spite that some distinct concepts may exist only in one of the data domains. Thus, we need a general framework, which can incorporate both shared and distinct concepts, for cross-domain classification. To this end, we develop a probabilistic model, by which both the shared and distinct concepts can be learned by the EM process which optimizes the data likelihood. To validate the effectiveness of this model we intentionally construct the classification tasks where the distinct concepts exist in the data domains. The systematic experiments demonstrate the superiority of our model over all compared baselines, especially on those much more challenging tasks.


A Multi-Label Classification Approach for Coding Cancer Information Service Chat Transcripts

AAAI Conferences

National Cancer Institute's (NCI) Cancer Information Service (CIS) offers online instant messaging based information service called LiveHelp to patients, family members, friends, and other cancer information consumers. A cancer information specialist (IS) 'chats' with a consumer and provides information on a variety of topics including clinical trials. After a LiveHelp chat session is finished, the IS codes about 20 different elements of metadata about the session in electronic contact record forms (ECRF), which are to be later used for quality control and reporting. Besides straightforward elements like age and gender, more specific elements to be coded include the purpose of contact, the subjects of interaction, and the different responses provided to the consumer, the latter two often taking on multiple values. As such, ECRF coding is a time consuming task and automating this process could help ISs to focus more on their primary goal of helping consumers with valuable cancer related information. As a first attempt in this task, we explored multi-label and multi-class text classification approaches to code the purpose, subjects of interaction, and the responses provided based on the chat transcripts. With a sample dataset of about 673 transcripts, we achieved example-based F-scores of 0.67 (for subjects) and 0.58 (responses). We also achieved label-based micro F-scores of 0.65 (for subjects), 0.62 (for responses), and 0.61 (for purpose). To our knowledge this is the first attempt in automatic coding of LiveHelp transcripts and our initial results on the smaller corpus indicate promising future directions in this task.


Generalized Bregman Divergence and Gradient of Mutual Information for Vector Poisson Channels

arXiv.org Machine Learning

We investigate connections between information-theoretic and estimation-theoretic quantities in vector Poisson channel models. In particular, we generalize the gradient of mutual information with respect to key system parameters from the scalar to the vector Poisson channel model. We also propose, as another contribution, a generalization of the classical Bregman divergence that offers a means to encapsulate under a unifying framework the gradient of mutual information results for scalar and vector Poisson and Gaussian channel models. The so-called generalized Bregman divergence is also shown to exhibit various properties akin to the properties of the classical version. The vector Poisson channel model is drawing considerable attention in view of its application in various domains: as an example, the availability of the gradient of mutual information can be used in conjunction with gradient descent methods to effect compressive-sensing projection designs in emerging X-ray and document classification applications.