Structural Correspondence Learning for Cross-lingual Sentiment Classification with One-to-many Mappings

arXiv.org Machine Learning

Structural correspondence learning (SCL) is an effective method for cross-lingual sentiment classification. This approach uses unlabeled documents along with a word translation oracle to automatically induce task specific, cross-lingual correspondences. It transfers knowledge through identifying important features, i.e., pivot features. For simplicity, however, it assumes that the word translation oracle maps each pivot feature in source language to exactly only one word in target language. This one-to-one mapping between words in different languages is too strict. Also the context is not considered at all. In this paper, we propose a cross-lingual SCL based on distributed representation of words; it can learn meaningful one-to-many mappings for pivot words using large amounts of monolingual data and a small dictionary. We conduct experiments on NLP\&CC 2013 cross-lingual sentiment analysis dataset, employing English as source language, and Chinese as target language. Our method does not rely on the parallel corpora and the experimental results show that our approach is more competitive than the state-of-the-art methods in cross-lingual sentiment classification.


A Subspace Learning Framework for Cross-Lingual Sentiment Classification with Partial Parallel Data

AAAI Conferences

Cross-lingual sentiment classification aims to automatically predict sentiment polarity (e.g., positive or negative) of data in a label-scarce target language by exploiting labeled data from a label-rich language. The fundamental challenge of cross-lingual learning stems from a lack of overlap between the feature spaces of the source language data and that of the target language data. To address this challenge, previous work in the literature mainly relies on the large amount of bilingual parallel corpora to bridge the language gap. In many real applications, however, it is often the case that we have some partial parallel data but it is an expensive and time-consuming job to acquire large amount of parallel data on different languages. In this paper, we propose a novel subspace learning framework by leveraging the partial parallel data for cross-lingual sentiment classification. The proposed approach is achieved by jointly learning the document-aligned review data and un-aligned data from the source language and the target language via a non-negative matrix factorization framework. We conduct a set of experiments with cross-lingual sentiment classification tasks on multilingual Amazon product reviews. Our experimental results demonstrate the efficacy of the proposed cross-lingual approach.


Li

AAAI Conferences

Structural correspondence learning (SCL) is an effective method for cross-lingual sentiment classification. This approach uses unlabeled documents along with a word translation oracle to automatically induce task specific, cross-lingual correspondences. It transfers knowledge through identifying important features, i.e., pivot features. For simplicity, however, it assumes that the word translation oracle maps each pivot feature in source language to exactly only one word in target language. This one-to-one mapping between words in different languages is too strict. Also the context is not considered at all. In this paper, we propose a cross-lingual SCL based on distributed representation of words; it can learn meaningful one-to-many mappings for pivot words using large amounts of monolingual data and a small dictionary. We conduct experiments on NLP&CC 2013 cross-lingual sentiment analysis dataset, employing English as source language, and Chinese as target language. Our method does not rely on the parallel corpora and the experimental results show that our approach is more competitive than the state-of-the-art methods in cross-lingual sentiment classification.


Distributional Correspondence Indexing for Cross-Lingual and Cross-Domain Sentiment Classification.

Journal of Artificial Intelligence Research

Domain Adaptation (DA) techniques aim at enabling machine learning methods learn effective classifiers for a "target'' domain when the only available training data belongs to a different "source'' domain. In this paper we present the Distributional Correspondence Indexing (DCI) method for domain adaptation in sentiment classification. DCI derives term representations in a vector space common to both domains where each dimension reflects its distributional correspondence to a pivot, i.e., to a highly predictive term that behaves similarly across domains. Term correspondence is quantified by means of a distributional correspondence function (DCF). We propose a number of efficient DCFs that are motivated by the distributional hypothesis, i.e., the hypothesis according to which terms with similar meaning tend to have similar distributions in text. Experiments show that DCI obtains better performance than current state-of-the-art techniques for cross-lingual and cross-domain sentiment classification. DCI also brings about a significantly reduced computational cost, and requires a smaller amount of human intervention. As a final contribution, we discuss a more challenging formulation of the domain adaptation problem, in which both the cross-domain and cross-lingual dimensions are tackled simultaneously.


Semi-Supervised Matrix Completion for Cross-Lingual Text Classification

AAAI Conferences

Cross-lingual text classification is the task of assigning labels to observed documents in a label-scarce target language domain by using a prediction model trained with labeled documents from a label-rich source language domain. Cross-lingual text classification is popularly studied in natural language processing area to reduce the expensive manual annotation effort required in the target language domain. In this work, we propose a novel semi-supervised representation learning approach to address this challenging task by inducing interlingual features via semi-supervised matrix completion. To evaluate the proposed learning technique, we conduct extensive experiments on eighteen cross language sentiment classification tasks with four different languages. The empirical results demonstrate the efficacy of the proposed approach, and show it outperforms a number of related cross-lingual learning methods.