Ontologies
Ontology drift is a challenge for explainable data governance
We introduce the needs for explainable AI that arise from Standard No. 239 from the Basel Committee on Banking Standards (BCBS 239), which outlines 11 principles for effective risk data aggregation and risk reporting for financial institutions. Of these, explainableAI is necessary for compliance in two key aspects: data quality, and appropriate reporting for multiple stakeholders. We describe the implementation challenges for one specific regulatory requirement:that of having a complete data taxonomy that is appropriate for firmwide use. The constantly evolving nature of financial ontologies necessitate a continuous updating process to ensure ongoing compliance.
DySR: A Dynamic Representation Learning and Aligning based Model for Service Bundle Recommendation
Liu, Mingyi, Tu, Zhiying, Xu, Xiaofei, Wang, Zhongjie
An increasing number and diversity of services are available, which result in significant challenges to effective reuse service during requirement satisfaction. There have been many service bundle recommendation studies and achieved remarkable results. However, there is still plenty of room for improvement in the performance of these methods. The fundamental problem with these studies is that they ignore the evolution of services over time and the representation gap between services and requirements. In this paper, we propose a dynamic representation learning and aligning based model called DySR to tackle these issues. DySR eliminates the representation gap between services and requirements by learning a transformation function and obtains service representations in an evolving social environment through dynamic graph representation learning. Extensive experiments conducted on a real-world dataset from ProgrammableWeb show that DySR outperforms existing state-of-the-art methods in commonly used evaluation metrics, improving $F1@5$ from $36.1\%$ to $69.3\%$.
ProcessCO v1.3's Terms, Properties, Relationships and Axioms - A Core Ontology for Processes
The present preprint specifies and defines all Terms, Properties, Relationships and Axioms of ProcessCO (Process Core Ontology). ProcessCO is an ontology devoted mainly for Work Entities and related terms, which is placed at the core level in the context of a multilayer ontological architecture called FCD-OntoArch (Foundational, Core, and Domain Ontological Architecture for Sciences). This is a five-layered ontological architecture, which considers Foundational, Core, Domain and Instance levels, where the domain level is split down in two sub-levels, namely: Top-domain and Low-domain. Ontologies at the same level can be related to each other, except for the foundational level where only ThingFO (Thing Foundational Ontology) is found. In addition, ontologies' terms and relationships at lower levels can be semantically enriched by ontologies' terms and relationships from the higher levels. Note that both ThingFO and ontologies at the core level such as ProcessCO, SituationCO, among others, are domain independent with respect to their terms. Stereotypes are the mechanism used for enriching ProcessCO terms mainly from the ThingFO ontology. Note that in the end of this document, we address the ProcessCO vs. ThingFO non-taxonomic relationship verification matrix.
Cybonto: Towards Human Cognitive Digital Twins for Cybersecurity
Cyber defense is reactive and slow. On average, the time-to-remedy is hundreds of times larger than the time-to-compromise. In response to the expanding ever-more-complex threat landscape, Digital Twins (DTs) and particularly Human Digital Twins (HDTs) offer the capability of running massive simulations across multiple knowledge domains. Simulated results may offer insights into adversaries' behaviors and tactics, resulting in better proactive cyber-defense strategies. For the first time, this paper solidifies the vision of DTs and HDTs for cybersecurity via the Cybonto conceptual framework proposal. The paper also contributes the Cybonto ontology, formally documenting 108 constructs and thousands of cognitive-related paths based on 20 time-tested psychology theories. Finally, the paper applied 20 network centrality algorithms in analyzing the 108 constructs. The identified top 10 constructs call for extensions of current digital cognitive architectures in preparation for the DT future.
Extending Sticky-Datalog+/- via Finite-Position Selection Functions: Tractability, Algorithms, and Optimization
Bertossi, Leopoldo, Milani, Mostafa
Weakly-Sticky(WS) Datalog+/- is an expressive member of the family of Datalog+/- program classes that is defined on the basis of the conditions of stickiness and weak-acyclicity. Conjunctive query answering (QA) over the WS programs has been investigated, and its tractability in data complexity has been established. However, the design and implementation of practical QA algorithms and their optimizations have been open. In order to fill this gap, we first study Sticky and WS programs from the point of view of the behavior of the chase procedure. We extend the stickiness property of the chase to that of generalized stickiness of the chase (GSCh) modulo an oracle that selects (and provides) the predicate positions where finitely values appear during the chase. Stickiness modulo a selection function S that provides only a subset of those positions defines sch(S), a semantic subclass of GSCh. Program classes with selection functions include Sticky and WS, and another syntactic class that we introduce and characterize, namely JWS, of jointly-weakly-sticky programs, which contains WS. The selection functions for these last three classes are computable, and no external, possibly non-computable oracle is needed. We propose a bottom-up QA algorithm for programs in the class sch(S), for a general selection function S. As a particular case, we obtain a polynomial-time QA algorithm for JWS and weakly-sticky programs. Unlike WS, JWS turns out to be closed under magic-sets query optimization. As a consequence, both the generic polynomial-time QA algorithm and its magic-set optimization can be particularized and applied to WS.
A Cyber Science Based Ontology for Artificial General Intelligence Containment
Pittman, Jason M., Crosby, Courtney
The development of artificial general intelligence is considered by many to be inevitable. What such intelligence does after becoming aware is not so certain. To that end, research suggests that the likelihood of artificial general intelligence becoming hostile to humans is significant enough to warrant inquiry into methods to limit such potential. Thus, containment of artificial general intelligence is a timely and meaningful research topic. While there is limited research exploring possible containment strategies, such work is bounded by the underlying field the strategies draw upon. Accordingly, we set out to construct an ontology to describe necessary elements in any future containment technology. Using existing academic literature, we developed a single domain ontology containing five levels, 32 codes, and 32 associated descriptors. Further, we constructed ontology diagrams to demonstrate intended relationships. We then identified humans, AGI, and the cyber world as novel agent objects necessary for future containment activities.
Chest ImaGenome Dataset for Clinical Reasoning
Wu, Joy T., Agu, Nkechinyere N., Lourentzou, Ismini, Sharma, Arjun, Paguio, Joseph A., Yao, Jasper S., Dee, Edward C., Mitchell, William, Kashyap, Satyananda, Giovannini, Andrea, Celi, Leo A., Moradi, Mehdi
Despite the progress in automatic detection of radiologic findings from chest X-ray (CXR) images in recent years, a quantitative evaluation of the explainability of these models is hampered by the lack of locally labeled datasets for different findings. With the exception of a few expert-labeled small-scale datasets for specific findings, such as pneumonia and pneumothorax, most of the CXR deep learning models to date are trained on global "weak" labels extracted from text reports, or trained via a joint image and unstructured text learning strategy. Inspired by the Visual Genome effort in the computer vision community, we constructed the first Chest ImaGenome dataset with a scene graph data structure to describe $242,072$ images. Local annotations are automatically produced using a joint rule-based natural language processing (NLP) and atlas-based bounding box detection pipeline. Through a radiologist constructed CXR ontology, the annotations for each CXR are connected as an anatomy-centered scene graph, useful for image-level reasoning and multimodal fusion applications. Overall, we provide: i) $1,256$ combinations of relation annotations between $29$ CXR anatomical locations (objects with bounding box coordinates) and their attributes, structured as a scene graph per image, ii) over $670,000$ localized comparison relations (for improved, worsened, or no change) between the anatomical locations across sequential exams, as well as ii) a manually annotated gold standard scene graph dataset from $500$ unique patients.
Tab2Know: Building a Knowledge Base from Tables in Scientific Papers
Kruit, Benno, He, Hongyu, Urbani, Jacopo
Tables in scientific papers contain a wealth of valuable knowledge for the scientific enterprise. To help the many of us who frequently consult this type of knowledge, we present Tab2Know, a new end-to-end system to build a Knowledge Base (KB) from tables in scientific papers. Tab2Know addresses the challenge of automatically interpreting the tables in papers and of disambiguating the entities that they contain. To solve these problems, we propose a pipeline that employs both statistical-based classifiers and logic-based reasoning. First, our pipeline applies weakly supervised classifiers to recognize the type of tables and columns, with the help of a data labeling system and an ontology specifically designed for our purpose. Then, logic-based reasoning is used to link equivalent entities (via sameAs links) in different tables. An empirical evaluation of our approach using a corpus of papers in the Computer Science domain has returned satisfactory performance. This suggests that ours is a promising step to create a large-scale KB of scientific knowledge.
Efficient TBox Reasoning with Value Restrictions using the $\mathcal{FL}_{o}$wer reasoner
Baader, Franz, Koopmann, Patrick, Michel, Friedrich, Turhan, Anni-Yasmin, Zarrieß, Benjamin
The inexpressive Description Logic (DL) $\mathcal{FL}_0$, which has conjunction and value restriction as its only concept constructors, had fallen into disrepute when it turned out that reasoning in $\mathcal{FL}_0$ w.r.t. general TBoxes is ExpTime-complete, i.e., as hard as in the considerably more expressive logic $\mathcal{ALC}$. In this paper, we rehabilitate $\mathcal{FL}_0$ by presenting a dedicated subsumption algorithm for $\mathcal{FL}_0$, which is much simpler than the tableau-based algorithms employed by highly optimized DL reasoners. Our experiments show that the performance of our novel algorithm, as prototypically implemented in our $\mathcal{FL}_o$wer reasoner, compares very well with that of the highly optimized reasoners. $\mathcal{FL}_o$wer can also deal with ontologies written in the extension $\mathcal{FL}_{\bot}$ of $\mathcal{FL}_0$ with the top and the bottom concept by employing a polynomial-time reduction, shown in this paper, which eliminates top and bottom. We also investigate the complexity of reasoning in DLs related to the Horn-fragments of $\mathcal{FL}_0$ and $\mathcal{FL}_{\bot}$.
A Study of the Quality of Wikidata
Shenoy, Kartik, Ilievski, Filip, Garijo, Daniel, Schwabe, Daniel, Szekely, Pedro
Wikidata has been increasingly adopted by many communities for a wide variety of applications, which demand high-quality knowledge to deliver successful results. In this paper, we develop a framework to detect and analyze low-quality statements in Wikidata by shedding light on the current practices exercised by the community. We explore three indicators of data quality in Wikidata, based on: 1) community consensus on the currently recorded knowledge, assuming that statements that have been removed and not added back are implicitly agreed to be of low quality; 2) statements that have been deprecated; and 3) constraint violations in the data. We combine these indicators to detect low-quality statements, revealing challenges with duplicate entities, missing triples, violated type rules, and taxonomic distinctions. Our findings complement ongoing efforts by the Wikidata community to improve data quality, aiming to make it easier for users and editors to find and correct mistakes.