Collaborating Authors

Using k-Way Co-Occurrences for Learning Word Embeddings

AAAI Conferences

Co-occurrences between two words provide useful insights into the semantics of those words.Consequently, numerous prior work on word embedding learning has used co-occurrences between two wordsas the training signal for learning word embeddings.However, in natural language texts it is common for multiple words to be related and co-occurring in the same context.We extend the notion of co-occurrences to cover k (≥2)-way co-occurrences among a set of k- words.Specifically, we prove a theoretical relationship between the joint probability of k (≥2) words, and the sum of l_2 norms of their embeddings. Next, we propose a learning objective motivated by our theoretical resultthat utilises k- way co-occurrences for learning word embeddings.Our experimental results show that the derived theoretical relationship does indeed hold empirically, anddespite data sparsity, for some smaller k (≤5) values, k- way embeddings perform comparably or better than 2-way embeddings in a range of tasks.

Semi-supervised emotion lexicon expansion with label propagation and specialized word embeddings Artificial Intelligence

There exist two main approaches to automatically extract affective orientation: lexicon-based and corpus-based. In this work, we argue that these two methods are compatible and show that combining them can improve the accuracy of emotion classifiers. In particular, we introduce a novel variant of the Label Propagation algorithm that is tailored to distributed word representations, we apply batch gradient descent to accelerate the optimization of label propagation and to make the optimization feasible for large graphs, and we propose a reproducible method for emotion lexicon expansion. We conclude that label propagation can expand an emotion lexicon in a meaningful way and that the expanded emotion lexicon can be leveraged to improve the accuracy of an emotion classifier.

Regularized Learning with Networks of Features

Neural Information Processing Systems

For many supervised learning problems, we possess prior knowledge about which features yield similar information about the target variable. In predicting the topic of a document, we might know that two words are synonyms, or when performing image recognition, we know which pixels are adjacent. Such synonymous or neighboring features are near-duplicates and should therefore be expected to have similar weights in a good model. Here we present a framework for regularized learning in settings where one has prior knowledge about which features are expected to have similar and dissimilar weights. This prior knowledge is encoded as a graph whose vertices represent features and whose edges represent similarities and dissimilarities between them. During learning, each feature's weight is penalized by the amount it differs from the average weight of its neighbors. For text classification, regularization using graphs of word co-occurrences outperforms manifold learning and compares favorably to other recently proposed semi-supervised learning methods. For sentiment analysis, feature graphs constructed from declarative human knowledge, as well as from auxiliary task learning, significantly improve prediction accuracy.

Semantic Structure-Based Word Embedding by Incorporating Concept Convergence and Word Divergence

AAAI Conferences

Representing the semantics of words is a fundamental task in text processing. Several research studies have shown that text and knowledge bases (KBs) are complementary sources for word embedding learning. Most existing methods only consider relationships within word-pairs in the usage of KBs. We argue that the structural information of well-organized words within the KBs is able to convey more effective and stable knowledge in capturing semantics of words. In this paper, we propose a semantic structure-based word embedding method, and introduce concept convergence and word divergence to reveal semantic structures in the word embedding learning process. To assess the effectiveness of our method, we use WordNet for training and conduct extensive experiments on word similarity, word analogy, text classification and query expansion. The experimental results show that our method outperforms state-of-the-art methods, including the methods trained solely on the corpus, and others trained on the corpus and the KBs.

Learning Word Representations from Relational Graphs

AAAI Conferences

If we already know a particular concept representations by considering the semantic relations between such as pets, we can describe a new concept such as dogs words. Specifically, given as input a relational graph, by stating the semantic relations that the new concept shares a directed labelled weighted graph where vertices represent with the existing concepts such as dogs belongs-to pets. Alternatively, words and edges represent numerous semantic relations we could describe a novel concept by listing all that exist between the corresponding words, we consider the the attributes it shares with existing concepts. In our example, problem of learning a vector representation for each vertex we can describe the concept dog by listing attributes (word) in the graph and a matrix representation for each label such as mammal, carnivorous, and domestic animal that it type (pattern). The learnt word representations are evaluated shares with another concept such as the cat. Therefore, both for their accuracy by using them to solve semantic word attributes and relations can be considered as alternative descriptors analogy questions on a benchmark dataset. of the same knowledge. This close connection between Our task of learning word attributes using relations between attributes and relations can be seen in knowledge representation words is challenging because of several reasons. First, schemes such as predicate logic, where attributes there can be multiple semantic relations between two words.