Collaborating Authors

Using k-Way Co-Occurrences for Learning Word Embeddings

AAAI Conferences

Co-occurrences between two words provide useful insights into the semantics of those words.Consequently, numerous prior work on word embedding learning has used co-occurrences between two wordsas the training signal for learning word embeddings.However, in natural language texts it is common for multiple words to be related and co-occurring in the same context.We extend the notion of co-occurrences to cover k (≥2)-way co-occurrences among a set of k- words.Specifically, we prove a theoretical relationship between the joint probability of k (≥2) words, and the sum of l_2 norms of their embeddings. Next, we propose a learning objective motivated by our theoretical resultthat utilises k- way co-occurrences for learning word embeddings.Our experimental results show that the derived theoretical relationship does indeed hold empirically, anddespite data sparsity, for some smaller k (≤5) values, k- way embeddings perform comparably or better than 2-way embeddings in a range of tasks.

Semi-supervised emotion lexicon expansion with label propagation and specialized word embeddings Artificial Intelligence

There exist two main approaches to automatically extract affective orientation: lexicon-based and corpus-based. In this work, we argue that these two methods are compatible and show that combining them can improve the accuracy of emotion classifiers. In particular, we introduce a novel variant of the Label Propagation algorithm that is tailored to distributed word representations, we apply batch gradient descent to accelerate the optimization of label propagation and to make the optimization feasible for large graphs, and we propose a reproducible method for emotion lexicon expansion. We conclude that label propagation can expand an emotion lexicon in a meaningful way and that the expanded emotion lexicon can be leveraged to improve the accuracy of an emotion classifier.

Regularized Learning with Networks of Features

Neural Information Processing Systems

For many supervised learning problems, we possess prior knowledge about which features yield similar information about the target variable. In predicting the topic of a document, we might know that two words are synonyms, or when performing image recognition, we know which pixels are adjacent. Such synonymous or neighboring features are near-duplicates and should therefore be expected to have similar weights in a good model. Here we present a framework for regularized learning in settings where one has prior knowledge about which features are expected to have similar and dissimilar weights. This prior knowledge is encoded as a graph whose vertices represent features and whose edges represent similarities and dissimilarities between them. During learning, each feature's weight is penalized by the amount it differs from the average weight of its neighbors. For text classification, regularization using graphs of word co-occurrences outperforms manifold learning and compares favorably to other recently proposed semi-supervised learning methods. For sentiment analysis, feature graphs constructed from declarative human knowledge, as well as from auxiliary task learning, significantly improve prediction accuracy.

Relation Adaptation: Learning to Extract Novel Relations with Minimum Supervision

AAAI Conferences

Extracting the relations that exist between two entities is an important step in numerousWeb-related tasks such as information extraction.A supervised relation extraction system that is trained to extract a particular relation type might not accurately extract a new type of a relation for which it has not been trained.However, it is costly to create training data manually for every new relation type that one might want to extract.We propose a method to adapt an existing relation extraction system to extractnew relation types with minimum supervision. Our proposed method comprises two stages: learning a lower-dimensional projection between different relations, and learning a relational classifier for the target relation type with instance sampling. We evaluate the proposed method using a dataset that contains 2000 instances for 20 different relation types. Our experimental results show that the proposed method achieves a statistically significant macro-average F-score of 62.77. Moreover, the proposed method outperforms numerous baselines and a previously proposed weakly-supervised relation extraction method.

Learning Word Representations from Relational Graphs

AAAI Conferences

Attributes of words and relations between two words are central to numerous tasks in Artificial Intelligence such as knowledge representation, similarity measurement, and analogy detection. Often when two words share one or more attributes in common, they are con- nected by some semantic relations. On the other hand, if there are numerous semantic relations between two words, we can expect some of the attributes of one of the words to be inherited by the other. Motivated by this close connection between attributes and relations, given a relational graph in which words are inter-connected via numerous semantic relations, we propose a method to learn a latent representation for the individual words. The proposed method considers not only the co-occurrences of words as done by existing approaches for word representation learning, but also the semantic relations in which two words co-occur. To evaluate the accuracy of the word representations learnt using the proposed method, we use the learnt word representa- tions to solve semantic word analogy problems. Our experimental results show that it is possible to learn better word representations by using semantic semantics between words.