Goto

Collaborating Authors

 new entity


Learning to Sample and Aggregate: Few-shot Reasoning over Temporal Knowledge Graphs

Neural Information Processing Systems

In this paper, we investigate a realistic but underexplored problem, called few-shot temporal knowledge graph reasoning, that aims to predict future facts for newly emerging entities based on extremely limited observations in evolving graphs. It offers practical value in applications that need to derive instant new knowledge about new entities in temporal knowledge graphs (TKGs) with minimal supervision. The challenges mainly come from the few-shot and time shift properties of new entities. First, the limited observations associated with them are insufficient for training a model from scratch. Second, the potentially dynamic distributions from the initially observable facts to the future facts ask for explicitly modeling the evolving characteristics of new entities.


Inductive Logical Query Answering in Knowledge Graphs

Neural Information Processing Systems

Formulating and answering logical queries is a standard communication interface for knowledge graphs (KGs). Alleviating the notorious incompleteness of real-world KGs, neural methods achieved impressive results in link prediction and complex query answering tasks by learning representations of entities, relations, and queries. Still, most existing query answering methods rely on transductive entity embeddings and cannot generalize to KGs containing new entities without retraining entity embeddings. In this work, we study the inductive query answering task where inference is performed on a graph containing new entities with queries over both seen and unseen entities. To this end, we devise two mechanisms leveraging inductive node and relational structure representations powered by graph neural networks (GNNs).Experimentally, we show that inductive models are able to perform logical reasoning at inference time over unseen nodes generalizing to graphs up to 500% larger than training ones. Exploring the efficiency--effectiveness trade-off, we find the inductive relational structure representation method generally achieves higher performance, while the inductive node representation method is able to answer complex queries in the inference-only regime without any training on queries and scale to graphs of millions of nodes.


TempEL: Linking Dynamically Evolving and Newly Emerging Entities

Neural Information Processing Systems

In our continuously evolving world, entities change over time and new, previously non-existing or unknown, entities appear. We study how this evolutionary scenario impacts the performance on a well established entity linking (EL) task. For that study, we introduce TempEL, an entity linking dataset that consists of time-stratified English Wikipedia snapshots from 2013 to 2022, from which we collect both anchor mentions of entities, and these target entities' descriptions. By capturing such temporal aspects, our newly introduced TempEL resource contrasts with currently existing entity linking datasets, which are composed of fixed mentions linked to a single static version of a target Knowledge Base (e.g., Wikipedia 2010 for CoNLL-AIDA). Indeed, for each of our collected temporal snapshots, TempEL contains links to entities that are continual, i.e., occur in all of the years, as well as completely new entities that appear for the first time at some point. Thus, we enable to quantify the performance of current state-of-the-art EL models for: (i) entities that are subject to changes over time in their Knowledge Base descriptions as well as their mentions' contexts, and (ii) newly created entities that were previously non-existing (e.g., at the time the EL model was trained). Our experimental results show that in terms of temporal performance degradation, (i) continual entities suffer a decrease of up to 3.1% EL accuracy, while (ii) for new entities this accuracy drop is up to 17.9%. This highlights the challenge of the introduced TempEL dataset and opens new research prospects in the area of time-evolving entity disambiguation.


Utilizing Large Language Models for Zero-Shot Medical Ontology Extension from Clinical Notes

Wu, Guanchen, Xie, Yuzhang, Wu, Huanwei, He, Zhe, Shao, Hui, Hu, Xiao, Yang, Carl

arXiv.org Artificial Intelligence

Integrating novel medical concepts and relationships into existing ontologies can significantly enhance their coverage and utility for both biomedical research and clinical applications. Clinical notes, as unstructured documents rich with detailed patient observations, offer valuable context-specific insights and represent a promising yet underutilized source for ontology extension. Despite this potential, directly leveraging clinical notes for ontology extension remains largely unexplored. To address this gap, we propose CLOZE, a novel framework that uses large language models (LLMs) to automatically extract medical entities from clinical notes and integrate them into hierarchical medical ontologies. By capitalizing on the strong language understanding and extensive biomedical knowledge of pre-trained LLMs, CLOZE effectively identifies disease-related concepts and captures complex hierarchical relationships. The zero-shot framework requires no additional training or labeled data, making it a cost-efficient solution. Furthermore, CLOZE ensures patient privacy through automated removal of protected health information (PHI). Experimental results demonstrate that CLOZE provides an accurate, scalable, and privacy-preserving ontology extension framework, with strong potential to support a wide range of downstream applications in biomedical research and clinical informatics.


Supplementary Material of Learning to Sample and Aggregate: Few-shot Reasoning over Temporal Knowledge Graphs Ruijie Wang

Neural Information Processing Systems

The supplementary material is structured as follows: Section A.1 gives the proof and analysis of Theorem 3.1; Section A.2 introduces the datasets and their statistics in detail; Section A.3 introduces the baselines utilized in experiments; Section A.4 discusses the experimental setup of baseline models as well as MetaTKGR; Section A.5 reports detailed experiment performance with statistical test results; A.1 Statements, Proof and Analysis of Theorem 3.1 Thus, we can improve the generalization ability of our meta-learner over time by the following update step by step, A.2 Datasets Figure 1: Number of entities over time. New entities continuously emerge on three public TKGs. Integrated Crisis Early Warning System (ICEWS18) is the collection of coded interactions between 3 socio-political actors which are extracted from news articles. Y AGO). Figure 1 shows the amount of new entities appearing over time. Figure 2 shows the corresponding distributions.



TempEL: Linking Dynamically Evolving and Newly Emerging Entities

Neural Information Processing Systems

License TempEL is distributed under Creative Commons Attribution-ShareAlike 4.0 International In this section we provide a more detailed documentation of the dataset with the intended uses. For what purpose was the dataset created? Who created the dataset and on behalf of which entity? Who funded the creation of the dataset? What do the instances that comprise the dataset represent?


TempEL: Linking Dynamically Evolving and Newly Emerging Entities

Neural Information Processing Systems

Entity linking (EL) is a well-established task that is concerned with mapping anchor mentions in text to target entities that describe them in a Knowledge Base (KB) (e.g., Wikipedia).



TempEL: Linking Dynamically Evolving and Newly Emerging Entities

Neural Information Processing Systems

Entity linking (EL) is a well-established task that is concerned with mapping anchor mentions in text to target entities that describe them in a Knowledge Base (KB) (e.g., Wikipedia).