Goto

Collaborating Authors

 Ontologies


Traveling tourist Part 1: Import WikiData to Neo4j with Neosemantics library

#artificialintelligence

After a short summer break, I have prepared a new blog series. In this first part, we will construct a knowledge graph of monuments located in Spain. As you might know, I have lately gained a lot of interest and respect for the wealth of knowledge that is available through the WikiData API. We will continue honing our SPARQL syntax knowledge and fetch the information regarding the monuments located in Spain from the WikiData API. I wasn't aware of this before, but scraping the RDF data available online and importing it into Neo4j is such a popular topic that Dr. Jesus Barrasa developed a Neosemantics library to help us with this process.


Beyond Social Media Analytics: Understanding Human Behaviour and Deep Emotion using Self Structuring Incremental Machine Learning

arXiv.org Machine Learning

This thesis develops a conceptual framework considering social data as representing the surface layer of a hierarchy of human social behaviours, needs and cognition which is employed to transform social data into representations that preserve social behaviours and their causalities. Based on this framework two platforms were built to capture insights from fast-paced and slow-paced social data. For fast-paced, a self-structuring and incremental learning technique was developed to automatically capture salient topics and corresponding dynamics over time. An event detection technique was developed to automatically monitor those identified topic pathways for significant fluctuations in social behaviours using multiple indicators such as volume and sentiment. This platform is demonstrated using two large datasets with over 1 million tweets. The separated topic pathways were representative of the key topics of each entity and coherent against topic coherence measures. Identified events were validated against contemporary events reported in news. Secondly for the slow-paced social data, a suite of new machine learning and natural language processing techniques were developed to automatically capture self-disclosed information of the individuals such as demographics, emotions and timeline of personal events. This platform was trialled on a large text corpus of over 4 million posts collected from online support groups. This was further extended to transform prostate cancer related online support group discussions into a multidimensional representation and investigated the self-disclosed quality of life of patients (and partners) against time, demographics and clinical factors. The capabilities of this extended platform have been demonstrated using a text corpus collected from 10 prostate cancer online support groups comprising of 609,960 prostate cancer discussions and 22,233 patients.


Phenotypical Ontology Driven Framework for Multi-Task Learning

arXiv.org Artificial Intelligence

Despite the large number of patients in Electronic Health Records (EHRs), the subset of usable data for modeling outcomes of specific phenotypes are often imbalanced and of modest size. This can be attributed to the uneven coverage of medical concepts in EHRs. In this paper, we propose OMTL, an Ontology-driven Multi-Task Learning framework, that is designed to overcome such data limitations. The key contribution of our work is the effective use of knowledge from a predefined well-established medical relationship graph (ontology) to construct a novel deep learning network architecture that mirrors this ontology. It can effectively leverage knowledge from a well-established medical relationship graph (ontology) by constructing a deep learning network architecture that mirrors this graph. This enables common representations to be shared across related phenotypes, and was found to improve the learning performance. The proposed OMTL naturally allows for multitask learning of different phenotypes on distinct predictive tasks. These phenotypes are tied together by their semantic distance according to the external medical ontology. Using the publicly available MIMIC-III database, we evaluate OMTL and demonstrate its efficacy on several real patient outcome predictions over state-of-the-art multi-task learning schemes.


Answering Counting Queries over DL-Lite Ontologies

arXiv.org Artificial Intelligence

Ontology-mediated query answering (OMQA) is a promising approach to data access and integration that has been actively studied in the knowledge representation and database communities for more than a decade. The vast majority of work on OMQA focuses on conjunctive queries, whereas more expressive queries that feature counting or other forms of aggregation remain largely unex-plored. In this paper, we introduce a general form of counting query, relate it to previous proposals, and study the complexity of answering such queries in the presence of DL-Lite ontologies. As it follows from existing work that query answering is intractable and often of high complexity, we consider some practically relevant restrictions, for which we establish improved complexity bounds.


CODO: An Ontology for Collection and Analysis of Covid-19 Data

arXiv.org Artificial Intelligence

The COviD-19 Ontology for cases and patient information (CODO) provides a model for the collection and analysis of data about the COVID-19 pandemic. The ontology provides a standards-based open-source model that facilitates the integration of data from heterogeneous data sources. The ontology was designed by analysing disparate COVID-19 data sources such as datasets, literature, services, etc. The ontology follows the best practices for vocabularies by re-using concepts from other leading vocabularies and by using the W3C standards RDF, OWL, SWRL, and SPARQL. The ontology already has one independent user and has incorporated real-world data from the government of India.


SHACL Satisfiability and Containment (Extended Paper)

arXiv.org Artificial Intelligence

The Shapes Constraint Language (SHACL) is a recent W3C recommendation language for validating RDF data. Specifically, SHACL documents are collections of constraints that enforce particular shapes on an RDF graph. Previous work on the topic has provided theoretical and practical results for the validation problem, but did not consider the standard decision problems of satisfiability and containment, which are crucial for verifying the feasibility of the constraints and important for design and optimization purposes. In this paper, we undertake a thorough study of different features of non-recursive SHACL by providing a translation to a new first-order language, called SCL, that precisely captures the semantics of SHACL w.r.t. satisfiability and containment. We study the interaction of SHACL features in this logic and provide the detailed map of decidability and complexity results of the aforementioned decision problems for different SHACL sublanguages. Notably, we prove that both problems are undecidable for the full language, but we present decidable combinations of interesting features.


Trove: Ontology-driven weak supervision for medical entity classification - Docwire News

#artificialintelligence

MOTIVATION: Recognizing named entities (NER) and their associated attributes like negation are core tasks in natural language processing. However, manually labeling data for entity tasks is time consuming and expensive, creating barriers to using machine learning in new medical applications. Weakly supervised learning, which automatically builds imperfect training sets from low cost, less accurate labeling rules, offers a potential solution. Medical ontologies are compelling sources for generating labels, however combining multiple ontologies without ground truth data creates challenges due to label noise introduced by conflicting entity definitions. Key questions remain on the extent to which weakly supervised entity classification can be automated using ontologies, or how much additional task-specific rule engineering is required for state-of-the-art performance.


Automated Reasoning in Temporal DL-Lite

arXiv.org Artificial Intelligence

This paper investigates the feasibility of automated reasoning over temporal DL-Lite (TDL-Lite) knowledge bases (KBs). We test the usage of off-the-shelf LTL reasoners to check satisfiability of TDL-Lite KBs. In particular, we test the robustness and the scalability of reasoners when dealing with TDL-Lite TBoxes paired with a temporal ABox. We conduct various experiments to analyse the performance of different reasoners by randomly generating TDL-Lite KBs and then measuring the running time and the size of the translations. Furthermore, in an effort to make the usage of TDL-Lite KBs a reality, we present a fully fledged tool with a graphical interface to design them. Our interface is based on conceptual modelling principles and it is integrated with our translation tool and a temporal reasoner.


Wikidata Constraints on MARS (Extended Technical Report)

arXiv.org Artificial Intelligence

Wikidata constraints, albeit useful, are represented and processed in an incomplete, ad hoc fashion. Constraint declarations do not fully express their meaning, and thus do not provide a precise, unambiguous basis for constraint specification, or a logical foundation for constraint-checking implementations. In prior work we have proposed a logical framework for Wikidata as a whole, based on multi-attributed relational structures (MARS) and related logical languages. In this paper we explain how constraints are handled in the proposed framework, and show that nearly all of Wikidata's existing property constraints can be completely characterized in it, in a natural and economical fashion. We also give characterizations for several proposed property constraints, and show that a variety of non-property constraints can be handled in the same framework.


Wikidata on MARS

arXiv.org Artificial Intelligence

Multi-attributed relational structures (MARSs) have been proposed as a formal data model for generalized property graphs, along with multi-attributed rule-based predicate logic (MARPL) as a useful rule-based logic in which to write inference rules over property graphs. Wikidata can be modelled in an extended MARS that adds the (imprecise) datatypes of Wikidata. The rules of inference for the Wikidata ontology can be modelled as a MARPL ontology, with extensions to handle the Wikidata datatypes and functions over these datatypes. Because many Wikidata qualifiers should participate in most inference rules in Wikidata a method of implicitly handling qualifier values on a per-qualifier basis is needed to make this modelling useful. The meaning of Wikidata is then the extended MARS that is the closure of running these rules on the Wikidata data model. Wikidata constraints can be modelled as multi-attributed predicate logic (MAPL) formulae, again extended with datatypes, that are evaluated over this extended MARS.