Ontologies
Towards Next-Generation Urban Decision Support Systems through AI-Powered Generation of Scientific Ontology using Large Language Models -- A Case in Optimizing Intermodal Freight Transportation
Tupayachi, Jose, Xu, Haowen, Omitaomu, Olufemi A., Camur, Mustafa Can, Sharmin, Aliza, Li, Xueping
The incorporation of Artificial Intelligence (AI) models into various optimization systems is on the rise. Yet, addressing complex urban and environmental management problems normally requires in-depth domain science and informatics expertise. This expertise is essential for deriving data and simulation-driven for informed decision support. In this context, we investigate the potential of leveraging the pre-trained Large Language Models (LLMs). By adopting ChatGPT API as the reasoning core, we outline an integrated workflow that encompasses natural language processing, methontology-based prompt tuning, and transformers. This workflow automates the creation of scenario-based ontology using existing research articles and technical manuals of urban datasets and simulations. The outcomes of our methodology are knowledge graphs in widely adopted ontology languages (e.g., OWL, RDF, SPARQL). These facilitate the development of urban decision support systems by enhancing the data and metadata modeling, the integration of complex datasets, the coupling of multi-domain simulation models, and the formulation of decision-making metrics and workflow. The feasibility of our methodology is evaluated through a comparative analysis that juxtaposes our AI-generated ontology with the well-known Pizza Ontology employed in tutorials for popular ontology software (e.g., prot\'eg\'e). We close with a real-world case study of optimizing the complex urban system of multi-modal freight transportation by generating anthologies of various domain data and simulations to support informed decision-making.
Elastic CRFs for Open-ontology Slot Filling
Dai, Yinpei, Zhang, Yichi, Liu, Hong, Ou, Zhijian, Huang, Yi, Feng, Junlan
Slot filling is a crucial component in task-oriented dialog systems that is used to parse (user) utterances into semantic concepts called slots. An ontology is defined by the collection of slots and the values that each slot can take. The most widely used practice of treating slot filling as a sequence labeling task suffers from two main drawbacks. First, the ontology is usually pre-defined and fixed and therefore is not able to detect new labels for unseen slots. Second, the one-hot encoding of slot labels ignores the correlations between slots with similar semantics, which makes it difficult to share knowledge learned across different domains. To address these problems, we propose a new model called elastic conditional random field (eCRF), where each slot is represented by the embedding of its natural language description and modeled by a CRF layer. New slot values can be detected by eCRF whenever a language description is available for the slot. In our experiment, we show that eCRFs outperform existing models in both in-domain and cross-domain tasks, especially in predicting unseen slots and values.
Towards Data-Driven Electricity Management: Multi-Region Harmonized Data and Knowledge Graph
Hanžel, Vid, Bertalanič, Blaž, Fortuna, Carolina
Due to growing population and technological advances, global electricity consumption, and consequently also CO2 emissions are increasing. The residential sector makes up 25% of global electricity consumption and has great potential to increase efficiency and reduce CO2 footprint without sacrificing comfort. However, a lack of uniform consumption data at the household level spanning multiple regions hinders large-scale studies and robust multi-region model development. This paper introduces a multi-region dataset compiled from publicly available sources and presented in a uniform format. This data enables machine learning tasks such as disaggregation, demand forecasting, appliance ON/OFF classification, etc. Furthermore, we develop an RDF knowledge graph that characterizes the electricity consumption of the households and contextualizes it with household related properties enabling semantic queries and interoperability with other open knowledge bases like Wikidata and DBpedia. This structured data can be utilized to inform various stakeholders towards data-driven policy and business development.
Uncertainty Management in the Construction of Knowledge Graphs: a Survey
Jarnac, Lucas, Chabot, Yoan, Couceiro, Miguel
Knowledge Graphs (KGs) are a major asset for companies thanks to their great flexibility in data representation and their numerous applications, e.g., vocabulary sharing, Q/A or recommendation systems. To build a KG it is a common practice to rely on automatic methods for extracting knowledge from various heterogeneous sources. But in a noisy and uncertain world, knowledge may not be reliable and conflicts between data sources may occur. Integrating unreliable data would directly impact the use of the KG, therefore such conflicts must be resolved. This could be done manually by selecting the best data to integrate. This first approach is highly accurate, but costly and time-consuming. That is why recent efforts focus on automatic approaches, which represents a challenging task since it requires handling the uncertainty of extracted knowledge throughout its integration into the KG. We survey state-of-the-art approaches in this direction and present constructions of both open and enterprise KGs and how their quality is maintained. We then describe different knowledge extraction methods, introducing additional uncertainty. We also discuss downstream tasks after knowledge acquisition, including KG completion using embedding models, knowledge alignment, and knowledge fusion in order to address the problem of knowledge uncertainty in KG construction. We conclude with a discussion on the remaining challenges and perspectives when constructing a KG taking into account uncertainty.
Efficient Biomedical Entity Linking: Clinical Text Standardization with Low-Resource Techniques
Achara, Akshit, Sasidharan, Sanand, N, Gagan
Clinical text is rich in information, with mentions of treatment, medication and anatomy among many other clinical terms. Multiple terms can refer to the same core concepts which can be referred as a clinical entity. Ontologies like the Unified Medical Language System (UMLS) are developed and maintained to store millions of clinical entities including the definitions, relations and other corresponding information. These ontologies are used for standardization of clinical text by normalizing varying surface forms of a clinical term through Biomedical entity linking. With the introduction of transformer-based language models, there has been significant progress in Biomedical entity linking. In this work, we focus on learning through synonym pairs associated with the entities. As compared to the existing approaches, our approach significantly reduces the training data and resource consumption. Moreover, we propose a suite of context-based and context-less reranking techniques for performing the entity disambiguation. Overall, we achieve similar performance to the state-of-the-art zero-shot and distant supervised entity linking techniques on the Medmentions dataset, the largest annotated dataset on UMLS, without any domain-based training. Finally, we show that retrieval performance alone might not be sufficient as an evaluation metric and introduce an article level quantitative and qualitative analysis to reveal further insights on the performance of entity linking methods.
Machine Learning and Data Analysis Using Posets: A Survey
Posets are discrete mathematical structures which are ubiquitous in a broad range of data analysis and machine learning applications. Research connecting posets to the data science domain has been ongoing for many years. In this paper, a comprehensive review of a wide range of studies on data analysis and machine learning using posets are examined in terms of their theory, algorithms and applications. In addition, the applied lattice theory domain of formal concept analysis will also be highlighted in terms of its machine learning applications.
Learning Visual-Semantic Subspace Representations for Propositional Reasoning
Moreira, Gabriel, Hauptmann, Alexander, Marques, Manuel, Costeira, João Paulo
Learning representations that capture rich semantic relationships and accommodate propositional calculus poses a significant challenge. Existing approaches are either contrastive, lacking theoretical guarantees, or fall short in effectively representing the partial orders inherent to rich visual-semantic hierarchies. In this paper, we propose a novel approach for learning visual representations that not only conform to a specified semantic structure but also facilitate probabilistic propositional reasoning. Our approach is based on a new nuclear norm-based loss. We show that its minimum encodes the spectral geometry of the semantics in a subspace lattice, where logical propositions can be represented by projection operators.
Artificial Intelligence (AI) in Legal Data Mining
Deroy, Aniket, Bailung, Naksatra Kumar, Ghosh, Kripabandhu, Ghosh, Saptarshi, Chakraborty, Abhijnan
Despite the availability of vast amounts of data, legal data is often unstructured, making it difficult even for law practitioners to ingest and comprehend the same. It is important to organise the legal information in a way that is useful for practitioners and downstream automation tasks. The word ontology was used by Greek philosophers to discuss concepts of existence, being, becoming and reality. Today, scientists use this term to describe the relation between concepts, data, and entities. A great example for a working ontology was developed by Dhani and Bhatt. This ontology deals with Indian court cases on intellectual property rights (IPR) The future of legal ontologies is likely to be handled by computer experts and legal experts alike.
Prompt-Time Ontology-Driven Symbolic Knowledge Capture with Large Language Models
Çöplü, Tolga, Bendiken, Arto, Skomorokhov, Andrii, Bateiko, Eduard, Cobb, Stephen
In applications such as personal assistants, large language models (LLMs) must consider the user's personal information and preferences. However, LLMs lack the inherent ability to learn from user interactions. This paper explores capturing personal information from user prompts using ontology and knowledge-graph approaches. We use a subset of the KNOW ontology, which models personal information, to train the language model on these concepts. We then evaluate the success of knowledge capture using a specially constructed dataset.
ABI Approach: Automatic Bias Identification in Decision-Making Under Risk based in an Ontology of Behavioral Economics
Ramos, Eduardo da C., Campos, Maria Luiza M., Baião, Fernanda
Organizational decision-making is crucial for success, yet cognitive biases can significantly affect risk preferences, leading to suboptimal outcomes. Risk seeking preferences for losses, driven by biases such as loss aversion, pose challenges and can result in severe negative consequences, including financial losses. This research introduces the ABI approach, a novel solution designed to support organizational decision-makers by automatically identifying and explaining risk seeking preferences during decision-making. This research makes a novel contribution by automating the identification and explanation of risk seeking preferences using Cumulative Prospect theory (CPT) from Behavioral Economics. The ABI approach transforms theoretical insights into actionable, real-time guidance, making them accessible to a broader range of organizations and decision-makers without requiring specialized personnel. By contextualizing CPT concepts into business language, the approach facilitates widespread adoption and enhances decision-making processes with deep behavioral insights. Our systematic literature review identified significant gaps in existing methods, especially the lack of automated solutions with a concrete mechanism for automatically identifying risk seeking preferences, and the absence of formal knowledge representation, such as ontologies, for identifying and explaining the risk preferences. The ABI Approach addresses these gaps, offering a significant contribution to decision-making research and practice. Furthermore, it enables automatic collection of historical decision data with risk preferences, providing valuable insights for enhancing strategic management and long-term organizational performance. An experiment provided preliminary evidence on its effectiveness in helping decision-makers recognize their risk seeking preferences during decision-making in the loss domain.