Goto

Collaborating Authors

 Statistical Learning


Representation Learning for Measuring Entity Relatedness with Rich Information

AAAI Conferences

Incorporating multiple types of relational information from heterogeneous networks has been proved effective in data mining. Although Wikipedia is one of the most famous heterogeneous network, previous works of semantic analysis on Wikipedia are mostly limited on single type of relations. In this paper, we aim at incorporating multiple types of relations to measure the semantic relatedness between Wikipedia entities. We propose a framework of coordinate matrix factorization to construct low-dimensional continuous representation for entities, categories and words in the same semantic space. We formulate this task as the completion of a sparse entity-entity association matrix, in which each entry quantifies the strength of relatedness between corresponding entities. We evaluate our model on the task of judging pair-wise word similarity. Experiment result shows that our model outperforms both traditional entity relatedness algorithms and other representation learning models.


Compressive Document Summarization via Sparse Optimization

AAAI Conferences

In this paper, we formulate a sparse optimization framework for extractive document summarization. The proposed framework has a decomposable convex objective function. We derive an efficient ADMM algorithm to solve it. To encourage diversity in the summaries, we explicitly introduce an additional sentence dissimilarity term in the optimization framework. We achieve significant improvement over previous related work under similar data reconstruction framework. We then generalize our formulation to the case of compressive summarization and derive a block coordinate descent algorithm to optimize the objective function. Performance on DUC 2006 and DUC 2007 datasets shows that our compressive summarization results are competitive against the state-of-the-art results while maintaining reasonable readability.


Syntax-Based Deep Matching of Short Texts

AAAI Conferences

Many tasks in natural language processing, ranging from machine translation to question answering, can be reduced to the problem of matching two sentences or more generally two short texts. We propose a new approach to the problem, called Deep Match Tree (DeepMatch_tree), under a general setting. The approach consists of two components, 1) a mining algorithm to discover patterns for matching two short-texts, defined in the product space of dependency trees, and 2) a deep neural network for matching short texts using the mined patterns, as well as a learning algorithm to build the network having a sparse structure. We test our algorithm on the problem of matching a tweet and a response in social media, a hard matching problem proposed in [Wang et al., 2013], and show that DeepMatch_tree can outperform a number of competitor models including one without using dependency trees and one based on word-embedding, all with large margins.


An Active Learning Approach to Coreference Resolution

AAAI Conferences

In this paper, we define the problem of coreference resolution in text as one of clustering with pairwise constraints where human experts are asked to provide pairwise constraints (pairwise judgments of coreferentiality) to guide the clustering process. Positing that these pairwise judgments are easy to obtain from humans given the right context, we show that with significantly lower number of pairwise judgments and feature-engineering effort, we can achieve competitive coreference performance. Further, we describe an active learning strategy that minimizes the overall number of such pairwise judgments needed by asking the most informative questions to human experts at each step of coreference resolution. We evaluate this hypothesis and our algorithms on both entity and event coreference tasks and on two languages.


Convolutional Neural Tensor Network Architecture for Community-Based Question Answering

AAAI Conferences

Retrieving similar questions is very important in community-based question answering. A major challenge is the lexical gap in sentence matching. In this paper, we propose a convolutional neural tensor network architecture to encode the sentences in semantic space and model their interactions with a tensor layer. Our model integrates sentence modeling and semantic matching into a single model, which can not only capture the useful information with convolutional and pooling layers, but also learn the matching metrics between the question and its answer. Besides, our model is a general architecture, with no need for the other knowledge such as lexical or syntactic analysis. The experimental results shows that our method outperforms the other methods on two matching tasks.


Integrating Importance, Non-Redundancy and Coherence in Graph-Based Extractive Summarization

AAAI Conferences

We propose a graph-based method for extractive single-document summarization which considers importance, non-redundancy and local coherence simultaneously. We represent input documents by means of a bipartite graph consisting of sentence and entity nodes. We rank sentences on the basis of importance by applying a graph-based ranking algorithm to this graph and ensure non-redundancy and local coherence of the summary by means of an optimization step. Our graph based method is applied to scientific articles from the journal PLOS Medicine. We use human judgements to evaluate the coherence of our summaries. We compare ROUGE scores and human judgements for coherence of different systems on scientific articles. Our method performs considerably better than other systems on this data. Also, our graph-based summarization technique achieves state-of-the-art results on DUC 2002 data. Incorporating our local coherence measure always achieves the best results.


Incorporating Domain and Sentiment Supervision in Representation Learning for Domain Adaptation

AAAI Conferences

Domain adaptation aims at learning robust classifiers across domains using labeled data from a source domain. Representation learning methods, which project the original features to a new feature space, have been proved to be quite effective for this task. However, these unsupervised methods neglect the domain information of the input and are not specialized for the classification task. In this work, we address two key factors to guide the representation learning process for domain adaptation of sentiment classification — one is domain supervision, enforcing the learned representation to better predict the domain of an input, and the other is sentiment supervision which utilizes the source domain sentiment labels to learn sentiment-favorable representations. Experimental results show that these two factors significantly improve the proposed models as expected.


Joint Learning of Character and Word Embeddings

AAAI Conferences

Most word embedding methods take a word as a basic unit and learn embeddings according to words' external contexts, ignoring the internal structures of words. However, in some languages such as Chinese, a word is usually composed of several  characters and contains rich internal information. The semantic meaning of a word is also related to the meanings of its composing characters. Hence, we take Chinese for example, and present a character-enhanced word embedding model (CWE). In order to address the issues of character ambiguity and non-compositional words, we propose multiple-prototype character embeddings and an effective word selection method. We evaluate the effectiveness of CWE on word relatedness computation and analogical reasoning. The results show that CWE outperforms other baseline methods which ignore internal character information.


Positive, Negative, or Neutral: Learning an Expanded Opinion Lexicon from Emoticon-Annotated Tweets

AAAI Conferences

We present a supervised framework for expanding an opinion lexicon for tweets. The lexicon contains part-of-speech (POS) disambiguated entries with a three-dimensional probability distribution for positive, negative, and neutral polarities. To obtain this distribution using machine learning, we propose word-level attributes based on POS tags and information calculated from streams of emoticon-annotated tweets. Our experimental results show that our method outperforms the three-dimensional word-level polarity classification performance obtained by semantic orientation, a state-of-the-art measure for establishing world-level sentiment.


Active Learning from Crowds with Unsure Option

AAAI Conferences

Learning from crowds , where the labels of data instances are collected using a crowdsourcing way, has attracted much attention during the past few years. In contrast to a typical crowdsourcing setting where all data instances are assigned to annotators for labeling,  active learning from crowds actively selects a subset of data instances and assigns them to the annotators, thereby reducing the cost of labeling. This paper goes a step further. Rather than assume all annotators must provide labels, we allow the annotators to express that they are unsure about the assigned data instances. By adding the “unsure” option, the workloads for the annotators are somewhat reduced, because saying “unsure” will be easier than trying to provide a crisp label for some difficult data instances. Moreover, it is safer to use “unsure” feedback than to use labels from reluctant annotators because the latter has more chance to be misleading. Furthermore, different annotators may experience difficulty in different data instances, and thus the unsure option provides a valuable ingredient for modeling crowds’ expertise. We propose the ALCU-SVM algorithm for this new learning problem. Experimental studies on simulated and real crowdsourcing data show that, by exploiting the unsure option, ALCU-SVM achieves very promising performance.