Semantic Networks
NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph Embedding
Zhang, Yongqi, Yao, Quanming, Shao, Yingxia, Chen, Lei
Knowledge Graph (KG) embedding is a fundamental problem in data mining research with many real-world applications. It aims to encode the entities and relations in the graph into low dimensional vector space, which can be used for subsequent algorithms. Negative sampling, which samples negative triplets from non-observed ones in the training data, is an important step in KG embedding. Recently, generative adversarial network (GAN), has been introduced in negative sampling. By sampling negative triplets with large scores, these methods avoid the problem of vanishing gradient and thus obtain better performance. However, using GAN makes the original model more complex and hard to train, where reinforcement learning must be used. In this paper, motivated by the observation that negative triplets with large scores are important but rare, we propose to directly keep track of them with the cache. However, how to sample from and update the cache are two important questions. We carefully design the solutions, which are not only efficient but also achieve a good balance between exploration and exploitation. In this way, our method acts as a "distilled" version of previous GA-based methods, which does not waste training time on additional parameters to fit the full distribution of negative triplets. The extensive experiments show that our method can gain significant improvement in various KG embedding models, and outperform the state-of-the-art negative sampling methods based on GAN.
An Unbiased Approach to Quantification of Gender Inclination using Interpretable Word Representations
Rekabsaz, Navid, Hanbury, Allan
Recent advances in word embedding provide significant benefit to various information processing tasks. Yet these dense representations and their estimation of word-to-word relatedness remain difficult to interpret and hard to analyze. As an alternative, explicit word representations i.e. vectors with clearly-defined dimensions, which can be words, windows of words, or documents are easily interpretable, and recent methods show competitive performance to the dense vectors. In this work, we propose a method to transfer word2vec SkipGram embedding model to its explicit representation model. The method provides interpretable explicit vectors while keeping the effectiveness of the original model, tested by evaluating the model on several word association collections. Based on the proposed explicit representation, we propose a novel method to quantify the degree of the existence of gender bias in the English language (used in Wikipedia) with regard to a set of occupations. By measuring the bias towards explicit Female and Male factors, the work demonstrates a general tendency of the majority of the occupations to male and a strong bias in a few specific occupations (e.g. nurse) to female.
Improved Knowledge Graph Embedding using Background Taxonomic Information
Fatemi, Bahare, Ravanbakhsh, Siamak, Poole, David
Knowledge graphs are used to represent relational information in terms of triples. To enable learning about domains, embedding models, such as tensor factorization models, can be used to make predictions of new triples. Often there is background taxonomic information (in terms of subclasses and subproperties) that should also be taken into account. We show that existing fully expressive (a.k.a. universal) models cannot provably respect subclass and subproperty information. We show that minimal modifications to an existing knowledge graph completion method enables injection of taxonomic information. Moreover, we prove that our model is fully expressive, assuming a lower-bound on the size of the embeddings. Experimental results on public knowledge graphs show that despite its simplicity our approach is surprisingly effective.
From Word To Sense Embeddings: A Survey on Vector Representations of Meaning
Camacho-Collados, Jose, Pilehvar, Mohammad Taher
Over the past years, distributed semantic representations have proved to be effective and flexible keepers of prior knowledge to be integrated into downstream applications. This survey focuses on the representation of meaning. We start from the theoretical background behind word vector space models and highlight one of their major limitations: the meaning conflation deficiency, which arises from representing a word with all its possible meanings as a single vector. Then, we explain how this deficiency can be addressed through a transition from the word level to the more fine-grained level of word senses (in its broader acceptation) as a method for modelling unambiguous lexical meaning. We present a comprehensive overview of the wide range of techniques in the two main branches of sense representation, i.e., unsupervised and knowledge-based. Finally, this survey covers the main evaluation procedures and applications for this type of representation, and provides an analysis of four of its important aspects: interpretability, sense granularity, adaptability to different domains and compositionality.
Approach for Semi-automatic Construction of Anti-infective Drug Ontology Based on Entity Linking
Shen, Ying, Deng, Yang, Yuan, Kaiqi, Liu, Li, Liu, Yong
The task of entity relation extraction discovers new relation facts and enables broader applications of knowledge graph. Distant supervision is widely adopted for relation extraction, which requires large amounts of texts containing entity pairs as training data. However, in some specific domains such as medicalrelated applications,entity pairs that have certain relations might not appear together, thus it is difficult to meet the requirement for distantly supervised relation extraction. In the light of this challenge, we propose a novel path-based model to discover new entity relation facts. Instead of finding texts for relation extraction, the proposed method extracts path-only information for entity pairs from the current knowledgegraph. For each pair of entities, multiple paths can be extracted, and some of them are more useful for relation extraction than others. In order to capture this observation, we employ attention mechanism to assign different weights for different paths, which highlights the useful paths for entity relation extraction. To demonstrate the effectiveness of the proposed method, we conduct various experiments on a large-scale medical knowledge graph. Compared with the state-of-the-art relation extraction methods using the structure of knowledge graph, the proposed method significantly improves the accuracy of extracted relation factsand achieves the best performance.
Embedding Uncertain Knowledge Graphs
Chen, Xuelu, Chen, Muhao, Shi, Weijia, Sun, Yizhou, Zaniolo, Carlo
Embedding models for deterministic Knowledge Graphs (KG) have been extensively studied, with the purpose of capturing latent semantic relations between entities and incorporating the structured knowledge into machine learning. However, there are many KGs that model uncertain knowledge, which typically model the inherent uncertainty of relations facts with a confidence score, and embedding such uncertain knowledge represents an unresolved challenge. The capturing of uncertain knowledge will benefit many knowledge-driven applications such as question answering and semantic search by providing more natural characterization of the knowledge. In this paper, we propose a novel uncertain KG embedding model UKGE, which aims to preserve both structural and uncertainty information of relation facts in the embedding space. Unlike previous models that characterize relation facts with binary classification techniques, UKGE learns embeddings according to the confidence scores of uncertain relation facts. To further enhance the precision of UKGE, we also introduce probabilistic soft logic to infer confidence scores for unseen relation facts during training. We propose and evaluate two variants of UKGE based on different learning objectives. Experiments are conducted on three real-world uncertain KGs via three tasks, i.e. confidence prediction, relation fact ranking, and relation fact classification. UKGE shows effectiveness in capturing uncertain knowledge by achieving promising results on these tasks, and consistently outperforms baselines on these tasks.
Words Can Shift: Dynamically Adjusting Word Representations Using Nonverbal Behaviors
Wang, Yansen, Shen, Ying, Liu, Zhun, Liang, Paul Pu, Zadeh, Amir, Morency, Louis-Philippe
Humans convey their intentions through the usage of both verbal and nonverbal behaviors during face-to-face communication. Speaker intentions often vary dynamically depending on different nonverbal contexts, such as vocal patterns and facial expressions. As a result, when modeling human language, it is essential to not only consider the literal meaning of the words but also the nonverbal contexts in which these words appear. To better model human language, we first model expressive nonverbal representations by analyzing the fine-grained visual and acoustic patterns that occur during word segments. In addition, we seek to capture the dynamic nature of nonverbal intents by shifting word representations based on the accompanying nonverbal behaviors. To this end, we propose the Recurrent Attended Variation Embedding Network (RAVEN) that models the fine-grained structure of nonverbal subword sequences and dynamically shifts word representations based on nonverbal cues. Our proposed model achieves competitive performance on two publicly available datasets for multimodal sentiment analysis and emotion recognition. We also visualize the shifted word representations in different nonverbal contexts and summarize common patterns regarding multimodal variations of word representations.
The Enterprise Knowledge Graph
Best conceived of as a "company brain," this knowledge graph focuses on integrating an organization's assortment of people, skills, experiences, materials, essential company databases, and projects, which greatly improves its self-knowledge and thereby yields competitive advantage. Compiled from combing through myriad databases, including those for human resources, emails, and manifold other sources, this knowledge graph provides the foundation for a rapid, detailed assessment of what knowledge and skills a company has at its disposal--and their relation to one another. This graph is designed to create better services and is extremely specific to an organization's industry, line of business, and area of specialization. For example, Google's and Yahoo's search engine endeavors mandate that they collect knowledge about every entity or subject in the world, so they can offer the most relevant, revealing information to their users. LinkedIn's knowledge graph, on the other hand, details people's professions, resumes, and career opportunities.1 Again, the relationships between these nodes are paramount.
Unseen Word Representation by Aligning Heterogeneous Lexical Semantic Spaces
Prokhorov, Victor, Pilehvar, Mohammad Taher, Kartsaklis, Dimitri, Lio, Pietro, Collier, Nigel
Word embedding techniques heavily rely on the abundance of training data for individual words. Given the Zipfian distribution of words in natural language texts, a large number of words do not usually appear frequently or at all in the training data. In this paper we put forward a technique that exploits the knowledge encoded in lexical resources, such as Word-Net, to induce embeddings for unseen words. Our approach adapts graph embedding and cross-lingual vector space transformation techniques in order to merge lexical knowledge encoded in ontologies with that derived from corpus statistics. We show that the approach can provide consistent performance improvements across multiple evaluation benchmarks: in-vitro, on multiple rare word similarity datasets, and invivo, in two downstream text classification tasks.
Differentiating Concepts and Instances for Knowledge Graph Embedding
Lv, Xin, Hou, Lei, Li, Juanzi, Liu, Zhiyuan
Concepts, which represent a group of different instances sharing common properties, are essential information in knowledge representation. Most conventional knowledge embedding methods encode both entities (concepts and instances) and relations as vectors in a low dimensional semantic space equally, ignoring the difference between concepts and instances. In this paper, we propose a novel knowledge graph embedding model named TransC by differentiating concepts and instances. Specifically, TransC encodes each concept in knowledge graph as a sphere and each instance as a vector in the same semantic space. We use the relative positions to model the relations between concepts and instances (i.e., instanceOf), and the relations between concepts and sub-concepts (i.e., subClassOf). We evaluate our model on both link prediction and triple classification tasks on the dataset based on YAGO. Experimental results show that TransC outperforms state-of-the-art methods, and captures the semantic transitivity for instanceOf and subClassOf relation. Our codes and datasets can be obtained from https:// github.com/davidlvxin/TransC.