Semantic Networks
Enriching Wikidata with Linked Open Data
Zhang, Bohui, Ilievski, Filip, Szekely, Pedro
Large public knowledge graphs, like Wikidata, contain billions of statements about tens of millions of entities, thus inspiring various use cases to exploit such knowledge graphs. However, practice shows that much of the relevant information that fits users' needs is still missing in Wikidata, while current linked open data (LOD) tools are not suitable to enrich large graphs like Wikidata. In this paper, we investigate the potential of enriching Wikidata with structured data sources from the LOD cloud. We present a novel workflow that includes gap detection, source selection, schema alignment, and semantic validation. We evaluate our enrichment method with two complementary LOD sources: a noisy source with broad coverage, DBpedia, and a manually curated source with a narrow focus on the art domain, Getty. Our experiments show that our workflow can enrich Wikidata with millions of novel statements from external LOD sources with high quality. Property alignment and data quality are key challenges, whereas entity alignment and source selection are well-supported by existing Wikidata mechanisms. We make our code and data available to support future work.
Evaluating and improving social awareness of energy communities through semantic network analysis of online news
Piselli, C., Colladon, A. Fronzetti, Segneri, L., Pisello, A. L.
The implementation of energy communities represents a cross-disciplinary phenomenon that has the potential to support the energy transition while fostering citizens' participation throughout the energy system and their exploitation of renewables. An important role is played by online information sources in engaging people in this process and increasing their awareness of associated benefits. In this view, this work analyses online news data on energy communities to understand people's awareness and the media importance of this topic. We use the Semantic Brand Score (SBS) indicator as an innovative measure of semantic importance, combining social network analysis and text mining methods. Results show different importance trends for energy communities and other energy and society-related topics, also allowing the identification of their connections. Our approach gives evidence to information gaps and possible actions that could be taken to promote a low-carbon energy transition.
DictBERT: Dictionary Description Knowledge Enhanced Language Model Pre-training via Contrastive Learning
Chen, Qianglong, Li, Feng-Lin, Xu, Guohai, Yan, Ming, Zhang, Ji, Zhang, Yin
Although pre-trained language models (PLMs) have achieved state-of-the-art performance on various natural language processing (NLP) tasks, they are shown to be lacking in knowledge when dealing with knowledge driven tasks. Despite the many efforts made for injecting knowledge into PLMs, this problem remains open. To address the challenge, we propose \textbf{DictBERT}, a novel approach that enhances PLMs with dictionary knowledge which is easier to acquire than knowledge graph (KG). During pre-training, we present two novel pre-training tasks to inject dictionary knowledge into PLMs via contrastive learning: \textit{dictionary entry prediction} and \textit{entry description discrimination}. In fine-tuning, we use the pre-trained DictBERT as a plugin knowledge base (KB) to retrieve implicit knowledge for identified entries in an input sequence, and infuse the retrieved knowledge into the input to enhance its representation via a novel extra-hop attention mechanism. We evaluate our approach on a variety of knowledge driven and language understanding tasks, including NER, relation extraction, CommonsenseQA, OpenBookQA and GLUE. Experimental results demonstrate that our model can significantly improve typical PLMs: it gains a substantial improvement of 0.5\%, 2.9\%, 9.0\%, 7.1\% and 3.3\% on BERT-large respectively, and is also effective on RoBERTa-large.
Knowledge Graph: Qi, Guilin, Chen, Huajun, Liu, Kang, Wang, Haofen, Ji, Qiu, Wu, Tianxing: 9789811081767: Amazon.com: Books
Dr. Guilin Qi is a professor at Southeast University, China, where he also serves as director of the Institute of Cognitive Intelligence and of the Knowledge Science and Engineering Lab. His research interests include knowledge representation and reasoning, knowledge graphs, uncertainty reasoning, and the semantic web. Prof. Qi is an editorial board member of the Journal of Web Semantics, and has co-edited special issues for the Annals of Mathematics and Artificial Intelligence, International Journal of Approximate Reasoning and Journal of Applied Logic. He has over 20 years of research experiences in knowledge engineering and has led many national and industrial projects on knowledge graphs. Prof. Qi has published more than 100 papers on knowledge engineering and knowledge graphs and holds two patents.
KG-NSF: Knowledge Graph Completion with a Negative-Sample-Free Approach
Bahaj, Adil, Lhazmir, Safae, Ghogho, Mounir
Knowledge Graph (KG) completion is an important task that greatly benefits knowledge discovery in many fields (e.g. biomedical research). In recent years, learning KG embeddings to perform this task has received considerable attention. Despite the success of KG embedding methods, they predominantly use negative sampling, resulting in increased computational complexity as well as biased predictions due to the closed world assumption. To overcome these limitations, we propose \textbf{KG-NSF}, a negative sampling-free framework for learning KG embeddings based on the cross-correlation matrices of embedding vectors. It is shown that the proposed method achieves comparable link prediction performance to negative sampling-based methods while converging much faster.
$\mu\text{KG}$: A Library for Multi-source Knowledge Graph Embeddings and Applications
Luo, Xindi, Sun, Zequn, Hu, Wei
This paper presents $\mu\text{KG}$, an open-source Python library for representation learning over knowledge graphs. $\mu\text{KG}$ supports joint representation learning over multi-source knowledge graphs (and also a single knowledge graph), multiple deep learning libraries (PyTorch and TensorFlow2), multiple embedding tasks (link prediction, entity alignment, entity typing, and multi-source link prediction), and multiple parallel computing modes (multi-process and multi-GPU computing). It currently implements 26 popular knowledge graph embedding models and supports 16 benchmark datasets. $\mu\text{KG}$ provides advanced implementations of embedding techniques with simplified pipelines of different tasks. It also comes with high-quality documentation for ease of use. $\mu\text{KG}$ is more comprehensive than existing knowledge graph embedding libraries. It is useful for a thorough comparison and analysis of various embedding models and tasks. We show that the jointly learned embeddings can greatly help knowledge-powered downstream tasks, such as multi-hop knowledge graph question answering. We will stay abreast of the latest developments in the related fields and incorporate them into $\mu\text{KG}$.
Facing Changes: Continual Entity Alignment for Growing Knowledge Graphs
Wang, Yuxin, Cui, Yuanning, Liu, Wenqiang, Sun, Zequn, Jiang, Yiqiao, Han, Kexin, Hu, Wei
Entity alignment is a basic and vital technique in knowledge graph (KG) integration. Over the years, research on entity alignment has resided on the assumption that KGs are static, which neglects the nature of growth of real-world KGs. As KGs grow, previous alignment results face the need to be revisited while new entity alignment waits to be discovered. In this paper, we propose and dive into a realistic yet unexplored setting, referred to as continual entity alignment. To avoid retraining an entire model on the whole KGs whenever new entities and triples come, we present a continual alignment method for this task. It reconstructs an entity's representation based on entity adjacency, enabling it to generate embeddings for new entities quickly and inductively using their existing neighbors. It selects and replays partial pre-aligned entity pairs to train only parts of KGs while extracting trustworthy alignment for knowledge augmentation. As growing KGs inevitably contain non-matchable entities, different from previous works, the proposed method employs bidirectional nearest neighbor matching to find new entity alignment and update old alignment. Furthermore, we also construct new datasets by simulating the growth of multilingual DBpedia. Extensive experiments demonstrate that our continual alignment method is more effective than baselines based on retraining or inductive learning.
On a Generalized Framework for Time-Aware Knowledge Graphs
Krause, Franz, Weller, Tobias, Paulheim, Heiko
Knowledge graphs have emerged as an effective tool for managing and standardizing semistructured domain knowledge in a human- and machine-interpretable way. In terms of graph-based domain applications, such as embeddings and graph neural networks, current research is increasingly taking into account the time-related evolution of the information encoded within a graph. Algorithms and models for stationary and static knowledge graphs are extended to make them accessible for time-aware domains, where time-awareness can be interpreted in different ways. In particular, a distinction needs to be made between the validity period and the traceability of facts as objectives of time-related knowledge graph extensions. In this context, terms and definitions such as dynamic and temporal are often used inconsistently or interchangeably in the literature. Therefore, with this paper we aim to provide a short but well-defined overview of time-aware knowledge graph extensions and thus faciliate future research in this field as well.
QuoteKG: A Multilingual Knowledge Graph of Quotes
Kuculo, Tin, Gottschalk, Simon, Demidova, Elena
Quotes of public figures can mark turning points in history. A quote can explain its originator's actions, foreshadowing political or personal decisions and revealing character traits. Impactful quotes cross language barriers and influence the general population's reaction to specific stances, always facing the risk of being misattributed or taken out of context. The provision of a cross-lingual knowledge graph of quotes that establishes the authenticity of quotes and their contexts is of great importance to allow the exploration of the lives of important people as well as topics from the perspective of what was actually said. In this paper, we present QuoteKG, the first multilingual knowledge graph of quotes. We propose the QuoteKG creation pipeline that extracts quotes from Wikiquote, a free and collaboratively created collection of quotes in many languages, and aligns different mentions of the same quote. QuoteKG includes nearly one million quotes in $55$ languages, said by more than $69,000$ people of public interest across a wide range of topics. QuoteKG is publicly available and can be accessed via a SPARQL endpoint.
Hardware-agnostic Computation for Large-scale Knowledge Graph Embeddings
Demir, Caglar, Ngomo, Axel-Cyrille Ngonga
Knowledge graph embedding research has mainly focused on learning continuous representations of knowledge graphs towards the link prediction problem. Recently developed frameworks can be effectively applied in research related applications. Yet, these frameworks do not fulfill many requirements of real-world applications. As the size of the knowledge graph grows, moving computation from a commodity computer to a cluster of computers in these frameworks becomes more challenging. Finding suitable hyperparameter settings w.r.t. time and computational budgets are left to practitioners. In addition, the continual learning aspect in knowledge graph embedding frameworks is often ignored, although continual learning plays an important role in many real-world (deep) learning-driven applications. Arguably, these limitations explain the lack of publicly available knowledge graph embedding models for large knowledge graphs. We developed a framework based on the frameworks DASK, Pytorch Lightning and Hugging Face to compute embeddings for large-scale knowledge graphs in a hardware-agnostic manner, which is able to address real-world challenges pertaining to the scale of real application. We provide an open-source version of our framework along with a hub of pre-trained models having more than 11.4 B parameters.