Goto

Collaborating Authors

 Ontologies


Construction and Application of Materials Knowledge Graph in Multidisciplinary Materials Science via Large Language Model

Neural Information Processing Systems

Knowledge in materials science is widely dispersed across extensive scientific literature, posing significant challenges to the efficient discovery and integration of new materials. Traditional methods, often reliant on costly and time-consuming experimental approaches, further complicate rapid innovation. Addressing these challenges, the integration of artificial intelligence with materials science has opened avenues for accelerating the discovery process, though it also demands precise annotation, data extraction, and traceability of information. To tackle these issues, this article introduces the Materials Knowledge Graph (MKG), which utilizes advanced natural language processing techniques integrated with large language models to extract and systematically organize a decade's worth of highquality research into structured triples, contains 162,605 nodes and 731,772 edges. MKG categorizes information into comprehensive labels such as Name, Formula, and Application, structured around a meticulously designed ontology, thus enhancing data usability and integration. By implementing network-based algorithms, MKG not only facilitates efficient link prediction but also significantly reduces reliance on traditional experimental methods. This structured approach not only streamlines materials research but also lays the groundwork for more sophisticated science knowledge graphs.


Continual Learning with Evolving Class Ontologies Zhiqiu Lin

Neural Information Processing Systems

Lifelong learners must recognize concept vocabularies that evolve over time. A common yet underexplored scenario is learning with class labels that continually refine/expand old classes. For example, humans learn to recognize dog before dog breeds. In practical settings, dataset versioning often introduces refinement to ontologies, such as autonomous vehicle benchmarks that refine a previous vehicle class into school-bus as autonomous operations expand to new cities. This paper formalizes a protocol for studying the problem of Learning with Evolving Class Ontology (LECO). LECO requires learning classifiers in distinct time periods (TPs); each TP introduces a new ontology of "fine" labels that refines old ontologies of "coarse" labels (e.g., dog breeds that refine the previous dog). LECO explores such questions as whether to annotate new data or relabel the old, how to exploit coarse labels, and whether to finetune the previous TP's model or train from scratch. To answer these questions, we leverage insights from related problems such as class-incremental learning.


A Theoretical and empirical evidence for ConE's design choice

Neural Information Processing Systems

Here we provide theoretical and empirical results to support that ConE's design choice makes sense, i.e., both rotation transformation and restricted transformation play a crucial role to the expressiveness of the model. A.1 Proof for transformations A.1.1 Proof for rotation transformation We will show that the rotation transformation in Eq. 10 can model all relation patterns that can be modeled by its Euclidean counterpart RotatE [7]. Three most common relation patterns are discussed in [7], including symmetry pattern, inverse pattern and composition pattern. Let T denote the set of all true triples. We formally define the three relation patterns as follows.


Language Models as Hierarchy Encoders Yuan He

Neural Information Processing Systems

Interpreting hierarchical structures latent in language is a key limitation of current language models (LMs). While previous research has implicitly leveraged these hierarchies to enhance LMs, approaches for their explicit encoding are yet to be explored.


Cell ontology guided transcriptome foundation model Xinyu Yuan

Neural Information Processing Systems

Transcriptome foundation models (TFMs) hold great promises of deciphering the transcriptomic language that dictate diverse cell functions by self-supervised learning on large-scale single-cell gene expression data, and ultimately unraveling the complex mechanisms of human diseases. However, current TFMs treat cells as independent samples and ignore the taxonomic relationships between cell types, which are available in cell ontology graphs. We argue that effectively leveraging this ontology information during the TFM pre-training can improve learning biologically meaningful gene co-expression patterns while preserving TFM as a general purpose foundation model for downstream zero-shot and fine-tuning tasks. To this end, we present single cell, Cell-ontology guided TFM (scCello). We introduce cell-type coherence loss and ontology alignment loss, which are minimized along with the masked gene expression prediction loss during the pre-training. The novel loss component guide scCello to learn the cell-type-specific representation and the structural relation between cell types from the cell ontology graph, respectively. We pre-trained scCello on 22 million cells from CellxGene database leveraging their cell-type labels mapped to the cell ontology graph from Open Biological and Biomedical Ontology Foundry. Our TFM demonstrates competitive generalization and transferability performance over the existing TFMs on biologically important tasks including identifying novel cell types of unseen cells, prediction of cell-typespecific marker genes, and cancer drug responses.


Supplementary Materials PERFOGRAPH: A Numerical Aware Program Graph Representation for Performance Optimization and Program Analysis

Neural Information Processing Systems

We investigated the effectiveness of Digit Embedding. We can see that the numbers in the (100090-100140) range are clustered together. We investigated with more ranges. Figure 3 shows the 2-d embedding of decimal numbers in the range [1.0, 10.0] and [20.0-31.0]. And the numbers with larger differences like (1.6478, 30.7010), (5.339, 30.5113) are far from Please note that in this setup, the Digit Embedding is still applied.





Comparison of Metadata Representation Models for Knowledge Graph Embeddings

arXiv.org Artificial Intelligence

Hyper-relational Knowledge Graphs (HRKGs) extend traditional KGs beyond binary relations, enabling the representation of contextual, provenance, and temporal information in domains, such as historical events, sensor data, video content, and narratives. HRKGs can be structured using several Metadata Representation Models (MRMs), including Reification (REF), Singleton Property (SGP), and RDF-star (RDR). However, the effects of different MRMs on KG Embedding (KGE) and Link Prediction (LP) models remain unclear. This study evaluates MRMs in the context of LP tasks, identifies the limitations of existing evaluation frameworks, and introduces a new task that ensures fair comparisons across MRMs. Furthermore, we propose a framework that effectively reflects the knowledge representations of the three MRMs in latent space. Experiments on two types of datasets reveal that REF performs well in simple HRKGs, whereas SGP is less effective. However, in complex HRKGs, the differences among MRMs in the LP tasks are minimal. Our findings contribute to an optimal knowledge representation strategy for HRKGs in LP tasks.