Ontologies
Exploring and Analyzing Machine Commonsense Benchmarks
Santos, Henrique, Gordon, Minor, Liang, Zhicheng, Forbush, Gretchen, McGuinness, Deborah L.
Commonsense question-answering (QA) tasks, in the form of benchmarks, are constantly being introduced for challenging and comparing commonsense QA systems. The benchmarks provide question sets that systems' developers can use to train and test new models before submitting their implementations to official leaderboards. Although these tasks are created to evaluate systems in identified dimensions (e.g. topic, reasoning type), this metadata is limited and largely presented in an unstructured format or completely not present. Because machine common sense is a fast-paced field, the problem of fully assessing current benchmarks and systems with regards to these evaluation dimensions is aggravated. We argue that the lack of a common vocabulary for aligning these approaches' metadata limits researchers in their efforts to understand systems' deficiencies and in making effective choices for future tasks. In this paper, we first discuss this MCS ecosystem in terms of its elements and their metadata. Then, we present how we are supporting the assessment of approaches by initially focusing on commonsense benchmarks. We describe our initial MCS Benchmark Ontology, an extensible common vocabulary that formalizes benchmark metadata, and showcase how it is supporting the development of a Benchmark tool that enables benchmark exploration and analysis.
DynamicHS: Streamlining Reiter's Hitting-Set Tree for Sequential Diagnosis
Given a system that does not work as expected, Sequential Diagnosis (SD) aims at suggesting a series of system measurements to isolate the true explanation for the system's misbehavior from a potentially exponential set of possible explanations. To reason about the best next measurement, SD methods usually require a sample of possible fault explanations at each step of the iterative diagnostic process. The computation of this sample can be accomplished by various diagnostic search algorithms. Among those, Reiter's HS-Tree is one of the most popular due its desirable properties and general applicability. Usually, HS-Tree is used in a stateless fashion throughout the SD process to (re)compute a sample of possible fault explanations in each iteration, each time given the latest (updated) system knowledge including all so-far collected measurements. At this, the built search tree is discarded between two iterations, although often large parts of the tree have to be rebuilt in the next iteration, involving redundant operations and calls to costly reasoning services. As a remedy to this, we propose DynamicHS, a variant of HS-Tree that maintains state throughout the diagnostic session and additionally embraces special strategies to minimize the number of expensive reasoner invocations. In this vein, DynamicHS provides an answer to a longstanding question posed by Raymond Reiter in his seminal paper from 1987. Extensive evaluations on real-world diagnosis problems prove the reasonability of the DynamicHS and testify its clear superiority to HS-Tree wrt. computation time. More specifically, DynamicHS outperformed HS-Tree in 96% of the executed sequential diagnosis sessions and, per run, the latter required up to 800% the time of the former. Remarkably, DynamicHS achieves these performance improvements while preserving all desirable properties as well as the general applicability of HS-Tree.
XAI4Wind: A Multimodal Knowledge Graph Database for Explainable Decision Support in Operations & Maintenance of Wind Turbines
Chatterjee, Joyjit, Dethlefs, Nina
Condition-based monitoring (CBM) has been widely utilised in the wind industry for monitoring operational inconsistencies and failures in turbines, with techniques ranging from signal processing and vibration analysis to artificial intelligence (AI) models using Supervisory Control & Acquisition (SCADA) data. However, existing studies do not present a concrete basis to facilitate explainable decision support in operations and maintenance (O&M), particularly for automated decision support through recommendation of appropriate maintenance action reports corresponding to failures predicted by CBM techniques. Knowledge graph databases (KGs) model a collection of domain-specific information and have played an intrinsic role for real-world decision support in domains such as healthcare and finance, but have seen very limited attention in the wind industry. We propose XAI4Wind, a multimodal knowledge graph for explainable decision support in real-world operational turbines and demonstrate through experiments several use-cases of the proposed KG towards O&M planning through interactive query and reasoning and providing novel insights using graph data science algorithms. The proposed KG combines multimodal knowledge like SCADA parameters and alarms with natural language maintenance actions, images etc. By integrating our KG with an Explainable AI model for anomaly prediction, we show that it can provide effective human-intelligible O&M strategies for predicted operational inconsistencies in various turbine sub-components. This can help instil better trust and confidence in conventionally black-box AI models. We make our KG publicly available and envisage that it can serve as the building ground for providing autonomous decision support in the wind industry.
Conceptual Software Engineering Applied to Movie Scripts and Stories
This study introduces another application of software engineering tools, conceptual modeling, which can be applied to other fields of research. One way to strengthen the relationship between software engineering and other fields is to develop a good way to perform conceptual modeling that is capable of addressing the peculiarities of these fields of study. This study concentrates on humanities and social sciences, which are usually considered softer and further away from abstractions and (abstract) machines. Specifically, we focus on conceptual modeling as a software engineering tool (e.g., UML) in the area of stories and movie scripts. Researchers in the humanities and social sciences might not use the same degree of formalization that engineers do, but they still find conceptual modeling useful. Current modeling techniques (e.g., UML) fail in this task because they are geared toward the creation of software systems. Similar Conceptual Modeling Language (e.g., ConML) has been proposed with the humanities and social sciences in mind and, as claimed, can be used to model anything. This study is a venture in this direction, where a software modeling technique, Thinging Machine (TM), is applied to movie scripts and stories. The paper presents a novel approach to developing diagrammatic static/dynamic models of movie scripts and stories. The TM model diagram serves as a neutral and independent representation for narrative discourse and can be used as a communication instrument among participants. The examples presented include examples from Propp s model of fairytales; the railway children and an actual movie script seem to point to the viability of the approach.
Investigating ADR mechanisms with knowledge graph mining and explainable AI
Bresso, Emmanuel, Monnin, Pierre, Bousquet, Cédric, Calvier, François-Elie, Ndiaye, Ndeye-Coumba, Petitpain, Nadine, Smaïl-Tabbone, Malika, Coulet, Adrien
Adverse Drug Reactions (ADRs) are characterized within randomized clinical trials and postmarketing pharmacovigilance, but their molecular mechanism remains unknown in most cases. Aside from clinical trials, many elements of knowledge about drug ingredients are available in open-access knowledge graphs. In addition, drug classifications that label drugs as either causative or not for several ADRs, have been established. We propose to mine knowledge graphs for identifying biomolecular features that may enable reproducing automatically expert classifications that distinguish drug causative or not for a given type of ADR. In an explainable AI perspective, we explore simple classification techniques such as Decision Trees and Classification Rules because they provide human-readable models, which explain the classification itself, but may also provide elements of explanation for molecular mechanisms behind ADRs. In summary, we mine a knowledge graph for features; we train classifiers at distinguishing, drugs associated or not with ADRs; we isolate features that are both efficient in reproducing expert classifications and interpretable by experts (i.e., Gene Ontology terms, drug targets, or pathway names); and we manually evaluate how they may be explanatory. Extracted features reproduce with a good fidelity classifications of drugs causative or not for DILI and SCAR. Experts fully agreed that 73% and 38% of the most discriminative features are possibly explanatory for DILI and SCAR, respectively; and partially agreed (2/3) for 90% and 77% of them. Knowledge graphs provide diverse features to enable simple and explainable models to distinguish between drugs that are causative or not for ADRs. In addition to explaining classifications, most discriminative features appear to be good candidates for investigating ADR mechanisms further.
Knowledge Graphs in Manufacturing and Production: A Systematic Literature Review
Buchgeher, Georg, Gabauer, David, Martinez-Gil, Jorge, Ehrlinger, Lisa
Knowledge graphs in manufacturing and production aim to make production lines more efficient and flexible with higher quality output. This makes knowledge graphs attractive for companies to reach Industry 4.0 goals. However, existing research in the field is quite preliminary, and more research effort on analyzing how knowledge graphs can be applied in the field of manufacturing and production is needed. Therefore, we have conducted a systematic literature review as an attempt to characterize the state-of-the-art in this field, i.e., by identifying exiting research and by identifying gaps and opportunities for further research. To do that, we have focused on finding the primary studies in the existing literature, which were classified and analyzed according to four criteria: bibliometric key facts, research type facets, knowledge graph characteristics, and application scenarios. Besides, an evaluation of the primary studies has also been carried out to gain deeper insights in terms of methodology, empirical evidence, and relevance. As a result, we can offer a complete picture of the domain, which includes such interesting aspects as the fact that knowledge fusion is currently the main use case for knowledge graphs, that empirical research and industrial application are still missing to a large extent, that graph embeddings are not fully exploited, and that technical literature is fast-growing but seems to be still far from its peak.
Graph integration of structured, semistructured and unstructured data for data journalism
Anadiotis, Angelos-Christos, Balalau, Oana, Conceicao, Catarina, Galhardas, Helena, Haddad, Mhd Yamen, Manolescu, Ioana, Merabti, Tayeb, You, Jingmao
Such a query can be answered currently at a high human effort cost, by inspecting e.g., a JSON list of Assemblée elected officials (available from NosDeputes.fr) and manually connecting the names with those found in a national registry of companies. This considerable effort may still miss connections that could be found if one added information about politicians' and business people's spouses, information sometimes available in public knowledge bases such as DBPedia, or journalists' notes. No single query language can be used on such heterogeneous data; instead, we study methods to query the corpus by specifying some keywords and asking for all the connections that exist, in one or across several data sources, between these keywords. This problem has been studied under the name of keyword search over structured data, in particular for relational databases [49, 27], XML documents [24, 33], RDF graphs [30, 16]. However, most of these works assumed one single source of data, in which connections among nodes are clearly identified. When authors considered several data sources [31], they still assumed that one query answer comes from a single data source. In contrast, the ConnectionLens system [10] answers keyword search queries over arbitrary combinations of datasets and heterogeneous data models, independently produced by actors unaware of each other's existence.
Scalable Cross-lingual Document Similarity through Language-specific Concept Hierarchies
Badenes-Olmedo, Carlos, García, Jose-Luis Redondo, Corcho, Oscar
With the ongoing growth in number of digital articles in a wider set of languages and the expanding use of different languages, we need annotation methods that enable browsing multi-lingual corpora. Multilingual probabilistic topic models have recently emerged as a group of semi-supervised machine learning models that can be used to perform thematic explorations on collections of texts in multiple languages. However, these approaches require theme-aligned training data to create a language-independent space. This constraint limits the amount of scenarios that this technique can offer solutions to train and makes it difficult to scale up to situations where a huge collection of multi-lingual documents are required during the training phase. This paper presents an unsupervised document similarity algorithm that does not require parallel or comparable corpora, or any other type of translation resource. The algorithm annotates topics automatically created from documents in a single language with cross-lingual labels and describes documents by hierarchies of multi-lingual concepts from independently-trained models. Experiments performed on the English, Spanish and French editions of JCR-Acquis corpora reveal promising results on classifying and sorting documents by similar content.
Smart Mobility Ontology: Current Trends and Future Directions
Yazdizadeh, Ali, Farooq, Bilal
Ontology, as a discipline of philosophy, explains the nature of existence and has its roots in Aristotle and Plato studies on "metaphysics" (Welty and Guarino, 2001). However, the word ontology originated from two Greek words: ontos (being) and logos (word), and conceived for the first time during the Sixteen century by German philosophers (Welty and Guarino, 2001). From then till the mid-twentieth, ontology evolved mainly as a branch of philosophy. However, with the advent of Artificial Intelligence since the 1950s, researchers perceived the necessity of ontology to describe a new world of intelligent systems (Welty and Guarino, 2001). Moreover, with the development of the World Wide Web in the 1990s, ontology development got to be common among different domain specialists to define and share the concepts and entities in their fields on the Internet (Noy et al., 2001). During the last three decades, ontology development studies have evolved and shifted from theoretical issues of ontology to practical implications of the use of ontology in real-world, large-scale applications (Noy et al., 2001). Nowadays, ontology development focuses mainly on defining machine interpretable concepts and their relationships in a domain. However, ontology development also pursues other goals, such as providing a common conceptualization of the domain on which different experts agree, (Métral and Cutting-Decelle, 2011) and enable them to reuse the domain knowledge (Noy et al., 2001). It also enables researchers to easily analyze the domain knowledge and eloquently express the domain assumptions.
Accelerating Road Sign Ground Truth Construction with Knowledge Graph and Machine Learning
Kim, Ji Eun, Henson, Cory, Huang, Kevin, Tran, Tuan A., Lin, Wan-Yi
Having a comprehensive, high-quality dataset of road sign annotation is critical to the success of AI-based Road Sign Recognition (RSR) systems. In practice, annotators often face difficulties in learning road sign systems of different countries; hence, the tasks are often time-consuming and produce poor results. We propose a novel approach using knowledge graphs and a machine learning algorithm - variational prototyping-encoder (VPE) - to assist human annotators in classifying road signs effectively. Annotators can query the Road Sign Knowledge Graph using visual attributes and receive closest matching candidates suggested by the VPE model. The VPE model uses the candidates from the knowledge graph and a real sign image patch as inputs. We show that our knowledge graph approach can reduce sign search space by 98.9%. Furthermore, with VPE, our system can propose the correct single candidate for 75% of signs in the tested datasets, eliminating the human search effort entirely in those cases.