Hitzler, Pascal
Building Knowledge Graphs Towards a Global Food Systems Datahub
Gelal, Nirmal, Gautam, Aastha, Norouzi, Sanaz Saki, Giordano, Nico, Silva, Claudio Dias da Jr, Francois, Jean Ribert, Onofre, Kelsey Andersen, Nelson, Katherine, Hutchinson, Stacy, Lin, Xiaomao, Welch, Stephen, Lollato, Romulo, Hitzler, Pascal, McGinty, Hande Küçük
Sustainable agricultural production aligns with several sustainability goals established by the United Nations (UN). However, there is a lack of studies that comprehensively examine sustainable agricultural practices across various products and production methods. Such research could provide valuable insights into the diverse factors influencing the sustainability of specific crops and produce while also identifying practices and conditions that are universally applicable to all forms of agricultural production. While this research might help us better understand sustainability, the community would still need a consistent set of vocabularies. These consistent vocabularies, which represent the underlying datasets, can then be stored in a global food systems datahub. The standardized vocabularies might help encode important information for further statistical analyses and AI/ML approaches in the datasets, resulting in the research targeting sustainable agricultural production. A structured method of representing information in sustainability, especially for wheat production, is currently unavailable. In an attempt to address this gap, we are building a set of ontologies and Knowledge Graphs (KGs) that encode knowledge associated with sustainable wheat production using formal logic. The data for this set of knowledge graphs are collected from public data sources, experimental results collected at our experiments at Kansas State University, and a Sustainability Workshop that we organized earlier in the year, which helped us collect input from different stakeholders throughout the value chain of wheat. The modeling of the ontology (i.e., the schema) for the Knowledge Graph has been in progress with the help of our domain experts, following a modular structure using KNARM methodology. In this paper, we will present our preliminary results and schemas of our Knowledge Graph and ontologies.
Aligning Generalisation Between Humans and Machines
Ilievski, Filip, Hammer, Barbara, van Harmelen, Frank, Paassen, Benjamin, Saralajew, Sascha, Schmid, Ute, Biehl, Michael, Bolognesi, Marianna, Dong, Xin Luna, Gashteovski, Kiril, Hitzler, Pascal, Marra, Giuseppe, Minervini, Pasquale, Mundt, Martin, Ngomo, Axel-Cyrille Ngonga, Oltramari, Alessandro, Pasi, Gabriella, Saribatur, Zeynep G., Serafini, Luciano, Shawe-Taylor, John, Shwartz, Vered, Skitalinskaya, Gabriella, Stachl, Clemens, van de Ven, Gido M., Villmann, Thomas
Recent advances in AI -- including generative approaches -- have resulted in technology that can support humans in scientific discovery and decision support but may also disrupt democracies and target individuals. The responsible use of AI increasingly shows the need for human-AI teaming, necessitating effective interaction between humans and machines. A crucial yet often overlooked aspect of these interactions is the different ways in which humans and machines generalise. In cognitive science, human generalisation commonly involves abstraction and concept learning. In contrast, AI generalisation encompasses out-of-domain generalisation in machine learning, rule-based reasoning in symbolic AI, and abstraction in neuro-symbolic AI. In this perspective paper, we combine insights from AI and cognitive science to identify key commonalities and differences across three dimensions: notions of generalisation, methods for generalisation, and evaluation of generalisation. We map the different conceptualisations of generalisation in AI and cognitive science along these three dimensions and consider their role in human-AI teaming. This results in interdisciplinary challenges across AI and cognitive science that must be tackled to provide a foundation for effective and cognitively supported alignment in human-AI teaming scenarios.
Accelerating Knowledge Graph and Ontology Engineering with Large Language Models
Shimizu, Cogan, Hitzler, Pascal
We gratefully acknowledge support from the Simons Foundation and member institutions. Our automated source to PDF conversion system has failed to produce PDF for the paper: 2411.09601 . Return to the abstract for an alternative link to the source, or to find an email address to contact the author. For help regarding the automated source to PDF system, please contact help@arxiv.org
Ontology Population using LLMs
Norouzi, Sanaz Saki, Barua, Adrita, Christou, Antrea, Gautam, Nikita, Eells, Andrew, Hitzler, Pascal, Shimizu, Cogan
Knowledge graphs (KGs) are increasingly utilized for data integration, representation, and visualization. While KG population is critical, it is often costly, especially when data must be extracted from unstructured text in natural language, which presents challenges, such as ambiguity and complex interpretations. Large Language Models (LLMs) offer promising capabilities for such tasks, excelling in natural language understanding and content generation. However, their tendency to ``hallucinate'' can produce inaccurate outputs. Despite these limitations, LLMs offer rapid and scalable processing of natural language data, and with prompt engineering and fine-tuning, they can approximate human-level performance in extracting and structuring data for KGs. This study investigates LLM effectiveness for the KG population, focusing on the Enslaved.org Hub Ontology. In this paper, we report that compared to the ground truth, LLM's can extract ~90% of triples, when provided a modular ontology as guidance in the prompts.
Knowledge in Triples for LLMs: Enhancing Table QA Accuracy with Semantic Extraction
Sholehrasa, Hossein, Norouzi, Sanaz Saki, Hitzler, Pascal, Jaberi-Douraki, Majid
Integrating structured knowledge from tabular formats poses significant challenges within natural language processing (NLP), mainly when dealing with complex, semi-structured tables like those found in the FeTaQA dataset. These tables require advanced methods to interpret and generate meaningful responses accurately. Traditional approaches, such as SQL and SPARQL, often fail to fully capture the semantics of such data, especially in the presence of irregular table structures like web tables. This paper addresses these challenges by proposing a novel approach that extracts triples straightforward from tabular data and integrates it with a retrieval-augmented generation (RAG) model to enhance the accuracy, coherence, and contextual richness of responses generated by a fine-tuned GPT-3.5-turbo-0125 model. Our approach significantly outperforms existing baselines on the FeTaQA dataset, particularly excelling in Sacre-BLEU and ROUGE metrics. It effectively generates contextually accurate and detailed long-form answers from tables, showcasing its strength in complex data interpretation.
The S2 Hierarchical Discrete Global Grid as a Nexus for Data Representation, Integration, and Querying Across Geospatial Knowledge Graphs
Stephen, Shirly, Faulk, Mitchell, Janowicz, Krzysztof, Fisher, Colby, Thelen, Thomas, Zhu, Rui, Hitzler, Pascal, Shimizu, Cogan, Currier, Kitty, Schildhauer, Mark, Rehberger, Dean, Wang, Zhangyu, Christou, Antrea
Geospatial Knowledge Graphs (GeoKGs) have become integral to the growing field of Geospatial Artificial Intelligence. Initiatives like the U.S. National Science Foundation's Open Knowledge Network program aim to create an ecosystem of nation-scale, cross-disciplinary GeoKGs that provide AI-ready geospatial data aligned with FAIR principles. However, building this infrastructure presents key challenges, including 1) managing large volumes of data, 2) the computational complexity of discovering topological relations via SPARQL, and 3) conflating multi-scale raster and vector data. Discrete Global Grid Systems (DGGS) help tackle these issues by offering efficient data integration and representation strategies. The KnowWhereGraph utilizes Google's S2 Geometry -- a DGGS framework -- to enable efficient multi-source data processing, qualitative spatial querying, and cross-graph integration. This paper outlines the implementation of S2 within KnowWhereGraph, emphasizing its role in topologically enriching and semantically compressing data. Ultimately, this work demonstrates the potential of DGGS frameworks, particularly S2, for building scalable GeoKGs.
The KnowWhereGraph Ontology
Shimizu, Cogan, Stephe, Shirly, Barua, Adrita, Cai, Ling, Christou, Antrea, Currier, Kitty, Dalal, Abhilekha, Fisher, Colby K., Hitzler, Pascal, Janowicz, Krzysztof, Li, Wenwen, Liu, Zilong, Mahdavinejad, Mohammad Saeid, Mai, Gengchen, Rehberger, Dean, Schildhauer, Mark, Shi, Meilin, Norouzi, Sanaz Saki, Tian, Yuanyuan, Wang, Sizhe, Wang, Zhangyu, Zalewski, Joseph, Zhou, Lu, Zhu, Rui
KnowWhereGraph is one of the largest fully publicly available geospatial knowledge graphs. It includes data from 30 layers on natural hazards (e.g., hurricanes, wildfires), climate variables (e.g., air temperature, precipitation), soil properties, crop and land-cover types, demographics, and human health, various place and region identifiers, among other themes. These have been leveraged through the graph by a variety of applications to address challenges in food security and agricultural supply chains; sustainability related to soil conservation practices and farm labor; and delivery of emergency humanitarian aid following a disaster. In this paper, we introduce the ontology that acts as the schema for KnowWhereGraph. This broad overview provides insight into the requirements and design specifications for the graph and its schema, including the development methodology (modular ontology modeling) and the resources utilized to implement, materialize, and deploy KnowWhereGraph with its end-user interfaces and public query SPARQL endpoint.
ConceptLens: from Pixels to Understanding
Dalal, Abhilekha, Hitzler, Pascal
ConceptLens is an innovative tool designed to illuminate the intricate workings of deep neural networks (DNNs) by visualizing hidden neuron activations. By integrating deep learning with symbolic methods, ConceptLens offers users a unique way to understand what triggers neuron activations and how they respond to various stimuli. The tool uses error-margin analysis to provide insights into the confidence levels of neuron activations, thereby enhancing the interpretability of DNNs. This paper presents an overview of ConceptLens, its implementation, and its application in real-time visualization of neuron activations and error margins through bar charts.
Error-margin Analysis for Hidden Neuron Activation Labels
Dalal, Abhilekha, Rayan, Rushrukh, Hitzler, Pascal
Understanding how high-level concepts are represented within artificial neural networks is a fundamental challenge in the field of artificial intelligence. While existing literature in explainable AI emphasizes the importance of labeling neurons with concepts to understand their functioning, they mostly focus on identifying what stimulus activates a neuron in most cases; this corresponds to the notion of recall in information retrieval. We argue that this is only the first-part of a two-part job; it is imperative to also investigate neuron responses to other stimuli, i.e., their precision. We call this the neuron label's error margin.
On the Value of Labeled Data and Symbolic Methods for Hidden Neuron Activation Analysis
Dalal, Abhilekha, Rayan, Rushrukh, Barua, Adrita, Vasserman, Eugene Y., Sarker, Md Kamruzzaman, Hitzler, Pascal
A major challenge in Explainable AI is in correctly interpreting activations of hidden neurons: accurate interpretations would help answer the question of what a deep learning system internally detects as relevant in the input, demystifying the otherwise black-box nature of deep learning systems. The state of the art indicates that hidden node activations can, in some cases, be interpretable in a way that makes sense to humans, but systematic automated methods that would be able to hypothesize and verify interpretations of hidden neuron activations are underexplored. This is particularly the case for approaches that can both draw explanations from substantial background knowledge, and that are based on inherently explainable (symbolic) methods. In this paper, we introduce a novel model-agnostic post-hoc Explainable AI method demonstrating that it provides meaningful interpretations. Our approach is based on using a Wikipedia-derived concept hierarchy with approximately 2 million classes as background knowledge, and utilizes OWL-reasoning-based Concept Induction for explanation generation. Additionally, we explore and compare the capabilities of off-the-shelf pre-trained multimodal-based explainable methods. Our results indicate that our approach can automatically attach meaningful class expressions as explanations to individual neurons in the dense layer of a Convolutional Neural Network. Evaluation through statistical analysis and degree of concept activation in the hidden layer show that our method provides a competitive edge in both quantitative and qualitative aspects compared to prior work.