Goto

Collaborating Authors

 Ontologies


From Internet of Things Data to Business Processes: Challenges and a Framework

arXiv.org Artificial Intelligence

In IoT environments, large amounts of procedural data are generated from IoT devices, information systems, and other software applications. The use of this data can foster the development of innovative applications in process control [63, 75, 56, 54, 35, 52, 42, 68], process conformance checking [23, 81, 83, 28], and process enhancement [67, 59], among others. Particularly, the use of process mining techniques to analyze not only process data but also IoT-collected data could provide important insights into processes and interactions as shown in different applications in the manufacturing domain, such as [58, 75, 56, 59, 67]. In these applications, IoT actuators are used to realize and execute process activities, while IoT sensors and smart tags are used to closely monitor the execution environment and involved resources [79, 75, 26, 37, 54]. IoT technology can therefore capture the context in which certain process tasks are performed, allowing process mining techniques to better understand and analyze the processes [7, 76, 12]. As such, besides the procedural data generated from the process execution systems, the data captured by IoT should also be considered an integral part of the process execution in the form of IoT-enriched event logs [57, 53]. Both the procedural nature of sensor logs, and the tight integration of these with the process executions and the executing resources [24] makes sensor data an integral part of process-based application scenarios in IoT [76, 75, 7]. However, the integration of IoT data and process data to be used for process mining is still often done ex-post in a manual fashion during a separate pre-processing phase [95, 73, 53]. In these cases, the data from the IoT environment is still collected and stored separately, and only later it is explicitly connected to the notion of a process, which is non-trivial as pointed out in the challenge "Bridging the Gap Between Event-based and Process-based Systems" in the BPM-IoT manifesto [37].


A Farewell to Harms: Risk Management for Medical Devices via the Riskman Ontology & Shapes

arXiv.org Artificial Intelligence

We introduce the Riskman ontology & shapes for representing and analysing information about risk management for medical devices. Risk management is concerned with taking necessary precautions so a medical device does not cause harms for users or the environment. To date, risk management documentation is submitted to notified bodies (for certification) in the form of semi-structured natural language text. We propose to use classes from the Riskman ontology to logically model risk management documentation, and to use the included SHACL constraints to check for syntactic completeness and conformity to relevant standards. In particular, the ontology is modelled after ISO 14971 and the recently published VDE Spec 90025. Our proposed methodology has the potential to save many person-hours for both manufacturers (when creating risk management documentation) as well as notified bodies (when assessing submitted applications for certification), and thus offers considerable benefits for healthcare and, by extension, society as a whole.


Biomedical Entity Linking for Dutch: Fine-tuning a Self-alignment BERT Model on an Automatically Generated Wikipedia Corpus

arXiv.org Artificial Intelligence

Biomedical entity linking, a main component in automatic information extraction from health-related texts, plays a pivotal role in connecting textual entities (such as diseases, drugs and body parts mentioned by patients) to their corresponding concepts in a structured biomedical knowledge base. The task remains challenging despite recent developments in natural language processing. This paper presents the first evaluated biomedical entity linking model for the Dutch language. We use MedRoBERTa.nl as base model and perform second-phase pretraining through self-alignment on a Dutch biomedical ontology extracted from the UMLS and Dutch SNOMED. We derive a corpus from Wikipedia of ontology-linked Dutch biomedical entities in context and fine-tune our model on this dataset. We evaluate our model on the Dutch portion of the Mantra GSC-corpus and achieve 54.7% classification accuracy and 69.8% 1-distance accuracy. We then perform a case study on a collection of unlabeled, patient-support forum data and show that our model is hampered by the limited quality of the preceding entity recognition step. Manual evaluation of small sample indicates that of the correctly extracted entities, around 65% is linked to the correct concept in the ontology. Our results indicate that biomedical entity linking in a language other than English remains challenging, but our Dutch model can be used to for high-level analysis of patient-generated text.


Increasing the LLM Accuracy for Question Answering: Ontologies to the Rescue!

arXiv.org Artificial Intelligence

There is increasing evidence that question-answering (QA) systems with Large Language Models (LLMs), which employ a knowledge graph/semantic representation of an enterprise SQL database (i.e. Text-to-SPARQL), achieve higher accuracy compared to systems that answer questions directly on SQL databases (i.e. Text-to-SQL). Our previous benchmark research showed that by using a knowledge graph, the accuracy improved from 16% to 54%. The question remains: how can we further improve the accuracy and reduce the error rate? Building on the observations of our previous research where the inaccurate LLM-generated SPARQL queries followed incorrect paths, we present an approach that consists of 1) Ontology-based Query Check (OBQC): detects errors by leveraging the ontology of the knowledge graph to check if the LLM-generated SPARQL query matches the semantic of ontology and 2) LLM Repair: use the error explanations with an LLM to repair the SPARQL query. Using the chat with the data benchmark, our primary finding is that our approach increases the overall accuracy to 72% including an additional 8% of "I don't know" unknown results. Thus, the overall error rate is 20%. These results provide further evidence that investing knowledge graphs, namely the ontology, provides higher accuracy for LLM powered question answering systems.


Decision support system for Forest fire management using Ontology with Big Data and LLMs

arXiv.org Artificial Intelligence

Forests are crucial for ecological balance, but wildfires, a major cause of forest loss, pose significant risks. Fire weather indices, which assess wildfire risk and predict resource demands, are vital. With the rise of sensor networks in fields like healthcare and environmental monitoring, semantic sensor networks are increasingly used to gather climatic data such as wind speed, temperature, and humidity. However, processing these data streams to determine fire weather indices presents challenges, underscoring the growing importance of effective forest fire detection. This paper discusses using Apache Spark for early forest fire detection, enhancing fire risk prediction with meteorological and geographical data. Building on our previous development of Semantic Sensor Network (SSN) ontologies and Semantic Web Rules Language (SWRL) for managing forest fires in Monesterial Natural Park, we expanded SWRL to improve a Decision Support System (DSS) using a Large Language Models (LLMs) and Spark framework. We implemented real-time alerts with Spark streaming, tailored to various fire scenarios, and validated our approach using ontology metrics, query-based evaluations, LLMs score precision, F1 score, and recall measures.


Development of Semantics-Based Distributed Middleware for Heterogeneous Data Integration and its Application for Drought

arXiv.org Artificial Intelligence

Drought is a complex environmental phenomenon that affects millions of people and communities all over the globe and is too elusive to be accurately predicted. This is mostly due to the scalability and variability of the web of environmental parameters that directly/indirectly causes the onset of different categories of drought. Since the dawn of man, efforts have been made to uniquely understand the natural indicators that provide signs of likely environmental events. These indicators/signs in the form of indigenous knowledge system have been used for generations. The intricate complexity of drought has, however, always been a major stumbling block for accurate drought prediction and forecasting systems. Recently, scientists in the field of agriculture and environmental monitoring have been discussing the integration of indigenous knowledge and scientific knowledge for a more accurate environmental forecasting system in order to incorporate diverse environmental information for a reliable drought forecast. Hence, in this research, the core objective is the development of a semantics-based data integration middleware that encompasses and integrates heterogeneous data models of local indigenous knowledge and sensor data towards an accurate drought forecasting system for the study areas. The local indigenous knowledge on drought gathered from the domain experts is transformed into rules to be used for performing deductive inference in conjunction with sensors data for determining the onset of drought through an automated inference generation module of the middleware. The semantic middleware incorporates, inter alia, a distributed architecture that consists of a streaming data processing engine based on Apache Kafka for real-time stream processing; a rule-based reasoning module; an ontology module for semantic representation of the knowledge bases.


On-device Online Learning and Semantic Management of TinyML Systems

arXiv.org Artificial Intelligence

Recent advances in Tiny Machine Learning (TinyML) empower low-footprint embedded devices for real-time on-device Machine Learning. While many acknowledge the potential benefits of TinyML, its practical implementation presents unique challenges. This study aims to bridge the gap between prototyping single TinyML models and developing reliable TinyML systems in production: (1) Embedded devices operate in dynamically changing conditions. Existing TinyML solutions primarily focus on inference, with models trained offline on powerful machines and deployed as static objects. However, static models may underperform in the real world due to evolving input data distributions. We propose online learning to enable training on constrained devices, adapting local models towards the latest field conditions. (2) Nevertheless, current on-device learning methods struggle with heterogeneous deployment conditions and the scarcity of labeled data when applied across numerous devices. We introduce federated meta-learning incorporating online learning to enhance model generalization, facilitating rapid learning. This approach ensures optimal performance among distributed devices by knowledge sharing. (3) Moreover, TinyML's pivotal advantage is widespread adoption. Embedded devices and TinyML models prioritize extreme efficiency, leading to diverse characteristics ranging from memory and sensors to model architectures. Given their diversity and non-standardized representations, managing these resources becomes challenging as TinyML systems scale up. We present semantic management for the joint management of models and devices at scale. We demonstrate our methods through a basic regression example and then assess them in three real-world TinyML applications: handwritten character image classification, keyword audio classification, and smart building presence detection, confirming our approaches' effectiveness.


Geospatial Knowledge Graphs

arXiv.org Artificial Intelligence

Geospatial knowledge graphs have emerged as a novel paradigm for representing and reasoning over geospatial information. In this framework, entities such as places, people, events, and observations are depicted as nodes, while their relationships are represented as edges. This graph-based data format lays the foundation for creating a "FAIR" (Findable, Accessible, Interoperable, and Reusable) environment, facilitating the management and analysis of geographic information. This entry first introduces key concepts in knowledge graphs along with their associated standardization and tools. It then delves into the application of knowledge graphs in geography and environmental sciences, emphasizing their role in bridging symbolic and subsymbolic GeoAI to address cross-disciplinary geospatial challenges. At the end, new research directions related to geospatial knowledge graphs are outlined.


Modern Information Technologies in Scientific Research and Educational Activities

arXiv.org Artificial Intelligence

Nowadays, there is a rapid development of information technology, which entails the need to constantly improve and expand the capabilities of interactive artificial intelligence systems This monograph combines several current topics related to the field of information technology One of the key topics is the methodology for enhancing the capabilities of conversational systems, with a focus on ChatGPT, which represents the latest advance in the field of artificial intelligence The monograph also discusses text generation systems based on ontological representations, which open up wide opportunities for creating high-quality content A special place in the work is given to an automated computer system for diagnosing the competitiveness of specialists in the field of information technology This helps to effectively assess the professionalism of specialists and determine the need for advanced training Theoretical aspects of correct color rendering and informatization of educational and research work of graduate students are important in ensuring the quality of education and scientific research And finally, the use of technology for creating 3D models has become an integral part of the modern information environment, which makes it possible to bring the most daring ideas and projects to life Research and development in these areas contribute to the improvement of information technologies, finding application in various fields of activity The purpose of our monograph is to conduct analysis and research in these areas in order to promote the development of information technologies and increase their efficiency The monograph was compiled based on the results of the XVI international scientific and practical conference "Information technologies and automation -- 2023", which took place in October 2023 at Odessa National University of Technology


Controlled Query Evaluation through Epistemic Dependencies

arXiv.org Artificial Intelligence

In this paper, we propose the use of epistemic dependencies to express data protection policies in Controlled Query Evaluation (CQE), which is a form of confidentiality-preserving query answering over ontologies and databases. The resulting policy language goes significantly beyond those proposed in the literature on CQE so far, allowing for very rich and practically interesting forms of data protection rules. We show the expressive abilities of our framework and study the data complexity of CQE for (unions of) conjunctive queries when ontologies are specified in the Description Logic DL-Lite_R. Interestingly, while we show that the problem is in general intractable, we prove tractability for the case of acyclic epistemic dependencies by providing a suitable query rewriting algorithm. The latter result paves the way towards the implementation and practical application of this new approach to CQE.