Semantic Networks
SciBERT-based Semantification of Bioassays in the Open Research Knowledge Graph
Anteghini, Marco, D'Souza, Jennifer, Santos, Vitor A. P. Martins dos, Auer, Sรถren
As a novel contribution to the problem of semantifying biological assays, in this paper, we propose a neural-network-based approach to automatically semantify, thereby structure, unstructured bioassay text descriptions. Experimental evaluations, to this end, show promise as the neural-based semantification significantly outperforms a naive frequency-based baseline approach. Specifically, the neural method attains 72% F1 versus 47% F1 from the frequency-based method.
CoDEx: A Comprehensive Knowledge Graph Completion Benchmark
We present CoDEx, a set of knowledge graph Completion Datasets Extracted from Wikidata and Wikipedia that improve upon existing knowledge graph completion benchmarks in scope and level of difficulty. In terms of scope, CoDEx comprises three knowledge graphs varying in size and structure, multilingual descriptions of entities and relations, and tens of thousands of hard negative triples that are plausible but verified to be false. To characterize CoDEx, we contribute thorough empirical analyses and benchmarking experiments. First, we analyze each CoDEx dataset in terms of logical relation patterns. Next, we report baseline link prediction and triple classification results on CoDEx for five extensively tuned embedding models. Finally, we differentiate CoDEx from a popular link prediction benchmark by showing that CoDEx covers more diverse and interpretable content, and contains fewer relation patterns that can be covered by trivial frequency-based rules. Data, code, and pretrained models are available at https://github.com/tsafavi/codex.
DistilE: Distiling Knowledge Graph Embeddings for Faster and Cheaper Reasoning
Zhu, Yushan, Zhang, Wen, Chen, Hui, Cheng, Xu, Zhang, Wei, Chen, Huajun
Knowledge Graph Embedding (KGE) is a popular method for KG reasoning and usually a higher dimensional one ensures better reasoning capability. However, high-dimensional KGEs pose huge challenges to storage and computing resources and are not suitable for resource-limited or time-constrained applications, for which faster and cheaper reasoning is necessary. To address this problem, we propose DistilE, a knowledge distillation method to build low-dimensional student KGE from pre-trained high-dimensional teacher KGE. We take the original KGE loss as hard label loss and design specific soft label loss for different KGEs in DistilE. We also propose a two-stage distillation approach to make the student and teacher adapt to each other and further improve the reasoning capability of the student. Our DistilE is general enough to be applied to various KGEs. Experimental results of link prediction show that our method successfully distills a good student which performs better than a same dimensional one directly trained, and sometimes even better than the teacher, and it can achieve 2 times - 8 times embedding compression rate and more than 10 times faster inference speed than the teacher with a small performance loss. We also experimentally prove the effectiveness of our two-stage training proposal via ablation study.
Tax Knowledge Graph for a Smarter and More Personalized TurboTax
Yu, Jay, McCluskey, Kevin, Mukherjee, Saikat
Most knowledge graph use cases are data-centric, focusing on representing data entities and their semantic relationships. There are no published success stories to represent large-scale complicated business logic with knowledge graph technologies. In this paper, we will share our innovative and practical approach to representing complicated U.S. and Canadian income tax compliance logic (calculations and rules) via a large-scale knowledge graph. We will cover how the Tax Knowledge Graph is constructed and automated, how it is used to calculate tax refunds, reasoned to find missing info, and navigated to explain the calculated results. The Tax Knowledge Graph has helped transform Intuit's flagship TurboTax product into a smart and personalized experience, accelerating and automating the tax preparation process while instilling confidence for millions of customers.
Learning semantic Image attributes using Image recognition and knowledge graph embeddings
Tiwari, Ashutosh, Varma, Sandeep
Extracting structured knowledge from texts has traditionally been used for knowledge base generation. However, other sources of information, such as images can be leveraged into this process to build more complete and richer knowledge bases. Structured semantic representation of the content of an image and knowledge graph embeddings can provide a unique representation of semantic relationships between image entities. Linking known entities in knowledge graphs and learning open-world images using language models has attracted lots of interest over the years. In this paper, we propose a shared learning approach to learn semantic attributes of images by combining a knowledge graph embedding model with the recognized attributes of images. The proposed model premises to help us understand the semantic relationship between the entities of an image and implicitly provide a link for the extracted entities through a knowledge graph embedding model. Under the limitation of using a custom user-defined knowledge base with limited data, the proposed model presents significant accuracy and provides a new alternative to the earlier approaches. The proposed approach is a step towards bridging the gap between frameworks which learn from large amounts of data and frameworks which use a limited set of predicates to infer new knowledge.
TorchKGE: Knowledge Graph Embedding in Python and PyTorch
TorchKGE is a Python module for knowledge graph (KG) embedding relying solely on PyTorch. This package provides researchers and engineers with a clean and efficient API to design and test new models. It features a KG data structure, simple model interfaces and modules for negative sampling and model evaluation. Its main strength is a very fast evaluation module for the link prediction task, a central application of KG embedding. Various KG embedding models are also already implemented. Special attention has been paid to code efficiency and simplicity, documentation and API consistency. It is distributed using PyPI under BSD license. Source code and pointers to documentation and deployment can be found at https://github.com/torchkge-team/torchkge.
This know-it-all AI learns by reading the entire web nonstop
This is a problem if we want AIs to be trustworthy. That's why Diffbot takes a different approach. It is building an AI that reads every page on the entire public web, in multiple languages, and extracts as many facts from those pages as it can. Like GPT-3, Diffbot's system learns by vacuuming up vast amounts of human-written text found online. But instead of using that data to train a language model, Diffbot turns what it reads into a series of three-part factoids that relate one thing to another: subject, verb, object.
NagE: Non-Abelian Group Embedding for Knowledge Graphs
Yang, Tong, Sha, Long, Hong, Pengyu
We demonstrated the existence of a group algebraic structure hidden in relational knowledge embedding problems, which suggests that a group-based embedding framework is essential for designing embedding models. Our theoretical analysis explores merely the intrinsic property of the embedding problem itself hence is model-independent. Motivated by the theoretical analysis, we have proposed a group theory-based knowledge graph embedding framework, in which relations are embedded as group elements, and entities are represented by vectors in group action spaces. We provide a generic recipe to construct embedding models associated with two instantiating examples: SO3E and SU2E, both of which apply a continuous non-Abelian group as the relation embedding. Empirical experiments using these two exampling models have shown state-of-the-art results on benchmark datasets.
More is not Always Better: The Negative Impact of A-box Materialization on RDF2vec Knowledge Graph Embeddings
Iana, Andreea, Paulheim, Heiko
RDF2vec is an embedding technique for representing knowledge graph entities in a continuous vector space. In this paper, we investigate the effect of materializing implicit A-box axioms induced by subproperties, as well as symmetric and transitive properties. While it might be a reasonable assumption that such a materialization before computing embeddings might lead to better embeddings, we conduct a set of experiments on DBpedia which demonstrate that the materialization actually has a negative effect on the performance of RDF2vec. In our analysis, we argue that despite the huge body of work devoted on completing missing information in knowledge graphs, such missing implicit information is actually a signal, not a defect, and we show examples illustrating that assumption.
PNEL: Pointer Network based End-To-End Entity Linking over Knowledge Graphs
Banerjee, Debayan, Chaudhuri, Debanjan, Dubey, Mohnish, Lehmann, Jens
Question Answering systems are generally modelled as a pipeline consisting of a sequence of steps. In such a pipeline, Entity Linking (EL) is often the first step. Several EL models first perform span detection and then entity disambiguation. In such models errors from the span detection phase cascade to later steps and result in a drop of overall accuracy. Moreover, lack of gold entity spans in training data is a limiting factor for span detector training. Hence the movement towards end-to-end EL models began where no separate span detection step is involved. In this work we present a novel approach to end-to-end EL by applying the popular Pointer Network model, which achieves competitive performance. We demonstrate this in our evaluation over three datasets on the Wikidata Knowledge Graph.