A Novel Two-Step Method for Cross Language Representation Learning

Dec-31-2013–Neural Information Processing Systems

Cross language text classiﬁcation is an important learning task in natural language processing. A critical challenge of cross language learning lies in that words of different languages are in disjoint feature spaces. In this paper, we propose a two-step representation learning method to bridge the feature spaces of different languages by exploiting a set of parallel bilingual documents. Speciﬁcally, we ﬁrst formulate a matrix completion problem to produce a complete parallel document-term matrix for all documents in two languages, and then induce a cross-lingual document representation by applying latent semantic indexing on the obtained matrix. We use a projected gradient descent algorithm to solve the formulated matrix completion problem with convergence guarantees. The proposed approach is evaluated by conducting a set of experiments with cross language sentiment classiﬁcation tasks on Amazon product reviews. The experimental results demonstrate that the proposed learning approach outperforms a number of comparison cross language representation learning methods, especially when the number of parallel bilingual documents is small.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Dec-31-2013

Conferences PDF

Add feedback

Genre:
- Research Report > New Finding (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language
    - Text Processing (1.00)
    - Discourse & Dialogue (0.70)

Duplicate Docs Excel Report

Title
A Novel Two-Step Method for Cross Language Representation Learning
A Novel Two-Step Method for Cross Language Representation Learning

Similar Docs Excel Report more

Title	Similarity	Source
None found