Semantic Networks
Computing Rule-Based Explanations of Machine Learning Classifiers using Knowledge Graphs
Dervakos, Edmund, Menis-Mastromichalakis, Orfeas, Chortaras, Alexandros, Stamou, Giorgos
The use of symbolic knowledge representation and reasoning as a way to resolve the lack of transparency of machine learning classifiers is a research area that lately attracts many researchers. In this work, we use knowledge graphs as the underlying framework providing the terminology for representing explanations for the operation of a machine learning classifier. In particular, given a description of the application domain of the classifier in the form of a knowledge graph, we introduce a novel method for extracting and representing black-box explanations of its operation, in the form of first-order logic rules expressed in the terminology of the knowledge graph.
Towards Loosely-Coupling Knowledge Graph Embeddings and Ontology-based Reasoning
Kaoudi, Zoi, Lorenzo, Abelardo Carlos Martinez, Markl, Volker
Knowledge graph completion (a.k.a.~link prediction), i.e.,~the task of inferring missing information from knowledge graphs, is a widely used task in many applications, such as product recommendation and question answering. The state-of-the-art approaches of knowledge graph embeddings and/or rule mining and reasoning are data-driven and, thus, solely based on the information the input knowledge graph contains. This leads to unsatisfactory prediction results which make such solutions inapplicable to crucial domains such as healthcare. To further enhance the accuracy of knowledge graph completion we propose to loosely-couple the data-driven power of knowledge graph embeddings with domain-specific reasoning stemming from experts or entailment regimes (e.g., OWL2). In this way, we not only enhance the prediction accuracy with domain knowledge that may not be included in the input knowledge graph but also allow users to plugin their own knowledge graph embedding and reasoning method. Our initial results show that we enhance the MRR accuracy of vanilla knowledge graph embeddings by up to 3x and outperform hybrid solutions that combine knowledge graph embeddings with rule mining and reasoning up to 3.5x MRR.
Transfer Knowledge Graphs to Doctor.ai
The ongoing COVID-19 pandemic has taught us many valuable lessons. One of them is that we need to modernise our healthcare system and make it more accessible to all people. Governments around the world have thrown a huge sum of money into digitalisation (here and here). But two years into the pandemic, rich countries such as Germany still use fax and buggy software in their hospitals to transmit data. This glaring technological gap costs lives.
Towards a Theoretical Understanding of Word and Relation Representation
Representing words by vectors, or embeddings, enables computational reasoning and is foundational to automating natural language tasks. For example, if word embeddings of similar words contain similar values, word similarity can be readily assessed, whereas judging that from their spelling is often impossible (e.g. cat /feline) and to predetermine and store similarities between all words is prohibitively time-consuming, memory intensive and subjective. We focus on word embeddings learned from text corpora and knowledge graphs. Several well-known algorithms learn word embeddings from text on an unsupervised basis by learning to predict those words that occur around each word, e.g. word2vec and GloVe. Parameters of such word embeddings are known to reflect word co-occurrence statistics, but how they capture semantic meaning has been unclear. Knowledge graph representation models learn representations both of entities (words, people, places, etc.) and relations between them, typically by training a model to predict known facts in a supervised manner. Despite steady improvements in fact prediction accuracy, little is understood of the latent structure that enables this. The limited understanding of how latent semantic structure is encoded in the geometry of word embeddings and knowledge graph representations makes a principled means of improving their performance, reliability or interpretability unclear. To address this: 1. we theoretically justify the empirical observation that particular geometric relationships between word embeddings learned by algorithms such as word2vec and GloVe correspond to semantic relations between words; and 2. we extend this correspondence between semantics and geometry to the entities and relations of knowledge graphs, providing a model for the latent structure of knowledge graph representation linked to that of word embeddings.
Learning Representations of Entities and Relations
Encoding facts as representations of entities and binary relationships between them, as learned by knowledge graph representation models, is useful for various tasks, including predicting new facts, question answering, fact checking and information retrieval. The focus of this thesis is on (i) improving knowledge graph representation with the aim of tackling the link prediction task; and (ii) devising a theory on how semantics can be captured in the geometry of relation representations. Most knowledge graphs are very incomplete and manually adding new information is costly, which drives the development of methods which can automatically infer missing facts. The first contribution of this thesis is HypER, a convolutional model which simplifies and improves upon the link prediction performance of the existing convolutional state-of-the-art model ConvE and can be mathematically explained in terms of constrained tensor factorisation. The second contribution is TuckER, a relatively straightforward linear model, which, at the time of its introduction, obtained state-of-the-art link prediction performance across standard datasets. The third contribution is MuRP, first multi-relational graph representation model embedded in hyperbolic space. MuRP outperforms all existing models and its Euclidean counterpart MuRE in link prediction on hierarchical knowledge graph relations whilst requiring far fewer dimensions. Despite the development of a large number of knowledge graph representation models with gradually increasing predictive performance, relatively little is known of the latent structure they learn. We generalise recent theoretical understanding of how semantic relations of similarity, paraphrase and analogy are encoded in the geometric interactions of word embeddings to how more general relations, as found in knowledge graphs, can be encoded in their representations.
Potential Destination Prediction Based on Knowledge Graph Under Low Predictability Data Condition
Li, Guilong, Chen, Yixian, Liao, Qionghua, He, Zhaocheng
Destination prediction has been a critical topic in transportation research, and there are a large number of studies. However, almost all existing studies are based on high predictability data conditions while pay less attention to the data condition with low predictability, where the regularity of single individuals is not exposed. Based on a certain period of observation, there is a fact that individuals may choose destinations beyond observation, which we call "potential destinations". The number of potential destinations is very large and can't be ignored for the data condition with low predictability formed by short-term observation.To reveal the choice pattern of potential destination of individuals under the data condition with low predictability, we propose a global optimization method based on knowledge graph embedding. First, we joint the trip data of all individuals by constructing Trip Knowledge Graph(TKG). Next, we optimize the general algorithm of knowledge graph embedding for our data and task in training strategy and objective function, then implement it on TKG. It can achieve global optimization for association paths that exist between almost any two entities in TKG. On this basis, a method for potential destination prediction is proposed, giving the possible ranking of unobserved destinations for each individual. In addition, we improve the performance by fusing static statistical information that is not passed to TKG. Finally, we validate our method in a real-world dataset, and the prediction results are highly consistent with individuals' potential destination choice behaviour.
A Survey on Visual Transfer Learning using Knowledge Graphs
Monka, Sebastian, Halilaj, Lavdim, Rettinger, Achim
Recent approaches of computer vision utilize deep learning methods as they perform quite well if training and testing domains follow the same underlying data distribution. However, it has been shown that minor variations in the images that occur when using these methods in the real world can lead to unpredictable errors. Transfer learning is the area of machine learning that tries to prevent these errors. Especially, approaches that augment image data using auxiliary knowledge encoded in language embeddings or knowledge graphs (KGs) have achieved promising results in recent years. This survey focuses on visual transfer learning approaches using KGs. KGs can represent auxiliary knowledge either in an underlying graph-structured schema or in a vector-based knowledge graph embedding. Intending to enable the reader to solve visual transfer learning problems with the help of specific KG-DL configurations we start with a description of relevant modeling structures of a KG of various expressions, such as directed labeled graphs, hypergraphs, and hyper-relational graphs. We explain the notion of feature extractor, while specifically referring to visual and semantic features. We provide a broad overview of knowledge graph embedding methods and describe several joint training objectives suitable to combine them with high dimensional visual embeddings. The main section introduces four different categories on how a KG can be combined with a DL pipeline: 1) Knowledge Graph as a Reviewer; 2) Knowledge Graph as a Trainee; 3) Knowledge Graph as a Trainer; and 4) Knowledge Graph as a Peer. To help researchers find evaluation benchmarks, we provide an overview of generic KGs and a set of image processing datasets and benchmarks including various types of auxiliary knowledge. Last, we summarize related surveys and give an outlook about challenges and open issues for future research.
A Knowledge Graph Embeddings based Approach for Author Name Disambiguation using Literals
Santini, Cristian, Gesese, Genet Asefa, Peroni, Silvio, Gangemi, Aldo, Sack, Harald, Alam, Mehwish
Data available in scholarly knowledge graphs (SKGs) - i.e., "a graph of data intended to accumulate and convey knowledge of the real world, whose nodes represent entities of interest and whose edges represent potentially different relations between these entities" [14] - is growing continuously every day, leading to a plethora of challenges concerning, for instance, article exploration and visualization [17], article recommendation [3], citation recommendation [11], and Author Name Disambiguation (AND) [24], which is relevant for the purposes of the present article. In particular, AND refers to a specific task of entity resolution which aims at resolving author mentions in bibliographic references to real-world people. Author persistent identifiers, such as ORCIDs and VIAFs, simplify the AND activity since such identifiers can be used for reconciling entities defined as different objects and representing the same real-world person. However, the availability of such persistent identifiers in SKGs - such as OpenCitations (OC) [22], AMiner [27] and Microsoft Academic Knowledge Graph (MAKG) [10] - is characterized by very low coverage and, as such, additional and computationally-oriented techniques must be adopted to identify different authors as the same person. In the past, many automatic approaches have been developed to automatically address AND by using publications metadata (e.g., title, abstract, keywords, venue, affiliation, etc.) to extract some features which can be used in the disambiguation task. These methods vary widely from supervised learning methods to unsupervised learning including recently developed deep neural network-based architectures [31]. However, the existing SKGs do not provide all the relevant contextual information necessary to reuse effectively and efficiently such approaches, that often rely on pure textual data. In contrast with the approaches mentioned above, this study focuses on performing AND for scholarly data represented as linked data or included in SKGs by considering the multi-modal information available in such collections, i.e., the structural information consisting of entities and relations between them as well as text or numeric values associated with the authors and publications defined in the form of literals (family name, given name, publication title, venue title, year of publication, etc.). The proposed framework to address this task is named Literally Author Name Disambiguation (LAND), which focuses on tackling the following research questions: - Can Knowledge Graph Embeddings (KGEs) - i.e. a technique that enables the creation of a "dense representation of the graph in a continuous, low-dimensional vector space that can then be used for machine learning tasks"[13] - be used effectively for the downstream task of clustering, more specifically for author name disambiguation?
Knowledge Graph Question Answering Leaderboard: A Community Resource to Prevent a Replication Crisis
Perevalov, Aleksandr, Yan, Xi, Kovriguina, Liubov, Jiang, Longquan, Both, Andreas, Usbeck, Ricardo
Data-driven systems need to be evaluated to establish trust in the scientific approach and its applicability. In particular, this is true for Knowledge Graph (KG) Question Answering (QA), where complex data structures are made accessible via natural-language interfaces. Evaluating the capabilities of these systems has been a driver for the community for more than ten years while establishing different KGQA benchmark datasets. However, comparing different approaches is cumbersome. The lack of existing and curated leaderboards leads to a missing global view over the research field and could inject mistrust into the results. In particular, the latest and most-used datasets in the KGQA community, LC-QuAD and QALD, miss providing central and up-to-date points of trust. In this paper, we survey and analyze a wide range of evaluation results with significant coverage of 100 publications and 98 systems from the last decade. We provide a new central and open leaderboard for any KGQA benchmark dataset as a focal point for the community - https://kgqa.github.io/leaderboard. Our analysis highlights existing problems during the evaluation of KGQA systems. Thus, we will point to possible improvements for future evaluations.
Temporal Knowledge Graph Completion: A Survey
Cai, Borui, Xiang, Yong, Gao, Longxiang, Zhang, He, Li, Yunfeng, Li, Jianxin
Knowledge graph completion (KGC) can predict missing links and is crucial for real-world knowledge graphs, which widely suffer from incompleteness. KGC methods assume a knowledge graph is static, but that may lead to inaccurate prediction results because many facts in the knowledge graphs change over time. Recently, emerging methods have shown improved predictive results by further incorporating the timestamps of facts; namely, temporal knowledge graph completion (TKGC). With this temporal information, TKGC methods can learn the dynamic evolution of the knowledge graph that KGC methods fail to capture. In this paper, for the first time, we summarize the recent advances in TKGC research. First, we detail the background of TKGC, including the problem definition, benchmark datasets, and evaluation metrics. Then, we summarize existing TKGC methods based on how timestamps of facts are used to capture the temporal dynamics. Finally, we conclude the paper and present future research directions of TKGC.