Goto

Collaborating Authors

 Asia


Local Translation Prediction with Global Sentence Representation

AAAI Conferences

Statistical machine translation models have made great progress in improving the translation quality. However, the existing models predict the target translation with only the source- and target-side local context information. In practice, distinguishing good translations from bad ones does not only depend on the local features, but also rely on the global sentence-level information. In this paper, we explore the source-side global sentence-level features for target-side local translation prediction. We propose a novel bilingually-constrained chunk-based convolutional neural network to learn sentence semantic representations. With the sentence-level feature representation, we further design a feed-forward neural network to better predict translations using both local and global information. The large-scale experiments show that our method can obtain substantial improvements in translation quality over the strong baseline: the hierarchical phrase-based translation model augmented with the neural network joint model.


Learning Term Embeddings for Hypernymy Identification

AAAI Conferences

Hypernymy identification aims at detecting if isA relationship holds between two words or phrases. Most previous methods are based on lexical patterns or the Distributional Inclusion Hypothesis, and the accuracy of such methods is not ideal. In this paper, we propose a simple yet effective supervision framework to identify hypernymy relations using distributed term representations (a.k.a term embeddings). First, we design a distance-margin neural network to learn term embeddings based on some pre-extracted hypernymy data. Then, we apply such embeddings as term features to identify positive hypernymy pairs through a supervision method. Experimental results demonstrate that our approach outperforms other supervised methods on two popular datasets and the learned term embeddings has better quality than existing term distributed representations with respect to hypernymy identification.


Optimizing Sentence Modeling and Selection for Document Summarization

AAAI Conferences

Extractive document summarization aims to conclude given documents by extracting some salient sentences. Often, it faces two challenges: 1) how to model the information redundancy among candidate sentences; 2) how to select the most appropriate sentences. This paper attempts to build a strong summarizer DivSelect+CNNLM by presenting new algorithms to optimize each of them. Concretely, it proposes CNNLM, a novel neural network language model (NNLM) based on convolutional neural network (CNN), to project sentences into dense distributed representations, then models sentence redundancy by cosine similarity. Afterwards, it formulates the selection process as an optimization problem, constructing a diversified selection process (DivSelect) with the aim of selecting some sentences which have high prestige, meantime, are dis-similar with each other. Experimental results on DUC2002 and DUC2004 benchmark data sets demonstrate the effectiveness of our approach.


Compressive Document Summarization via Sparse Optimization

AAAI Conferences

In this paper, we formulate a sparse optimization framework for extractive document summarization. The proposed framework has a decomposable convex objective function. We derive an efficient ADMM algorithm to solve it. To encourage diversity in the summaries, we explicitly introduce an additional sentence dissimilarity term in the optimization framework. We achieve significant improvement over previous related work under similar data reconstruction framework. We then generalize our formulation to the case of compressive summarization and derive a block coordinate descent algorithm to optimize the objective function. Performance on DUC 2006 and DUC 2007 datasets shows that our compressive summarization results are competitive against the state-of-the-art results while maintaining reasonable readability.


Convolutional Neural Networks for Text Hashing

AAAI Conferences

Hashing, as a popular approximate nearest neighbor search, has been widely used for large-scale similarity search. Recently, a spectrum of machine learning methods are utilized to learn similarity-preserving binary codes. However, most of them directly encode the explicit features, keywords, which fail to preserve the accurate semantic similarities in binary code beyond keyword matching, especially on short texts. Here we propose a novel text hashing framework with convolutional neural networks. In particular, we first embed the keyword features into compact binary code with a locality preserving constraint. Meanwhile word features and position features are together fed into a convolutional network to learn the implicit features which are further incorporated with the explicit features to fit the pre-trained binary code. Such base method can be successfully accomplished without any external tags/labels, and other three model variations are designed to integrate tags/labels. Experimental results show the superiority of our proposed approach over several state-of-the-art hashing methods when tested on one short text dataset as well as one normal text dataset.


Modeling Quantum Entanglements in Quantum Language Models

AAAI Conferences

Recently, a Quantum Language Model (QLM) was proposed to model term dependencies upon Quantum Theory (QT) framework and successively applied in Information Retrieval (IR). Nevertheless, QLM's dependency is based on co-occurrences of terms and has not yet taken into account the Quantum Entanglement (QE), which is a key quantum concept and has a significant cognitive implication. In QT, an entangled state can provide a more complete description for the nature of realities, and determine intrinsic correlations of considered objects globally, rather than those co-occurrences on the surface. It is, however, a real challenge to decide and measure QE using the classical statistics of texts in a post-measurement configuration. In order to circumvent this problem, we theoretically prove the connection between QE and statistically Unconditional Pure Dependence (UPD). Since UPD has an implementable deciding algorithm, we can in turn characterize QE by extracting the UPD patterns from texts. This leads to a measurable QE, based on which we further advance the existing QLM framework. We empirically compare our model with related models, and the results demonstrate the effectiveness of our model.


Syntax-Based Deep Matching of Short Texts

AAAI Conferences

Many tasks in natural language processing, ranging from machine translation to question answering, can be reduced to the problem of matching two sentences or more generally two short texts. We propose a new approach to the problem, called Deep Match Tree (DeepMatch_tree), under a general setting. The approach consists of two components, 1) a mining algorithm to discover patterns for matching two short-texts, defined in the product space of dependency trees, and 2) a deep neural network for matching short texts using the mined patterns, as well as a learning algorithm to build the network having a sparse structure. We test our algorithm on the problem of matching a tweet and a response in social media, a hard matching problem proposed in [Wang et al., 2013], and show that DeepMatch_tree can outperform a number of competitor models including one without using dependency trees and one based on word-embedding, all with large margins.


Target-Dependent Twitter Sentiment Classification with Rich Automatic Features

AAAI Conferences

Target-dependent sentiment analysis on Twitter has attracted increasing research attention. Most previous work relies on syntax, such as automatic parse trees, which are subject to noise for informal text such as tweets. In this paper, we show that competitive results can be achieved without the use of syntax, by extracting a rich set of automatic features. In particular, we split a tweet into a left context and a right context according to a given target, using distributed word representations and neural pooling functions to extract features. Both sentiment-driven and standard embeddings are used, and a rich set of neural pooling functions are explored. Sentiment lexicons are used as an additional source of information for feature extraction. In standard evaluation, the conceptually simple method gives a 4.8% absolute improvement over the state-of-the-art on three-way targeted sentiment classification, achieving the best reported results for this task.


User Modeling with Neural Network for Review Rating Prediction

AAAI Conferences

We present a neural network method for review rating prediction in this paper. Existing neural network methods for sentiment prediction typically only capture the semantics of texts, but ignore the user who expresses the sentiment.This is not desirable for review rating prediction as each user has an influence on how to interpret the textual content of a review.For example, the same word (e.g. good) might indicate different sentiment strengths when written by different users. We address this issue by developing a new neural network that takes user information into account. The intuition is to factor in user-specific modification to the meaning of a certain word.Specifically, we extend the lexical semantic composition models and introduce a user-word composition vector model (UWCVM), which effectively captures how user acts as a function affecting the continuous word representation. We integrate UWCVM into a supervised learning framework for review rating prediction, andconduct experiments on two benchmark review datasets.Experimental results demonstrate the effectiveness of our method. It shows superior performances over several strong baseline methods.


Modeling Mention, Context and Entity with Neural Networks for Entity Disambiguation

AAAI Conferences

Given a query consisting of a mention (name string) and a background document,entity disambiguation calls for linking the mention to an entity from reference knowledge base like Wikipedia.Existing studies typically use hand-crafted features to represent mention, context and entity, which is labor-intensive and weak to discover explanatory factors of data.In this paper, we address this problem by presenting a new neural network approach.The model takes consideration of the semantic representations of mention, context and entity, encodes them in continuous vector space and effectively leverages them for entity disambiguation.Specifically, we model variable-sized contexts with convolutional neural network, and embed the positions of context words to factor in the distance between context word and mention.Furthermore, we employ neural tensor network to model the semantic interactions between context and mention.We conduct experiments for entity disambiguation on two benchmark datasets from TAC-KBP 2009 and 2010.Experimental results show that our method yields state-of-the-art performances on both datasets.