Goto

Collaborating Authors

 Ontologies


Streamlining Social Media Information Retrieval for Public Health Research with Deep Learning

arXiv.org Artificial Intelligence

The utilization of social media in epidemic surveillance has been well established. Nonetheless, bias is often introduced when pre-defined lexicons are used to retrieve relevant corpus. This study introduces a framework aimed at curating extensive dictionaries of medical colloquialisms and Unified Medical Language System (UMLS) concepts. The framework comprises three modules: a BERT-based Named Entity Recognition (NER) model that identifies medical entities from social media content, a deep-learning powered normalization module that standardizes the extracted entities, and a semi-supervised clustering module that assigns the most probable UMLS concept to each standardized entity. We applied this framework to COVID-19-related tweets from February 1, 2020, to April 30, 2022, generating a symptom dictionary (available at https://github.com/ningkko/UMLS_colloquialism/) composed of 9,249 standardized entities mapped to 876 UMLS concepts and 38,175 colloquial expressions. This framework demonstrates encouraging potential in addressing the constraints of keyword matching information retrieval in social media-based public health research.


Explainable Representations for Relation Prediction in Knowledge Graphs

arXiv.org Artificial Intelligence

Knowledge graphs represent real-world entities and their relations in a semantically-rich structure supported by ontologies. Exploring this data with machine learning methods often relies on knowledge graph embeddings, which produce latent representations of entities that preserve structural and local graph neighbourhood properties, but sacrifice explainability. However, in tasks such as link or relation prediction, understanding which specific features better explain a relation is crucial to support complex or critical applications. We propose SEEK, a novel approach for explainable representations to support relation prediction in knowledge graphs. It is based on identifying relevant shared semantic aspects (i.e., subgraphs) between entities and learning representations for each subgraph, producing a multi-faceted and explainable representation. We evaluate SEEK on two real-world highly complex relation prediction tasks: protein-protein interaction prediction and gene-disease association prediction. Our extensive analysis using established benchmarks demonstrates that SEEK achieves significantly better performance than standard learning representation methods while identifying both sufficient and necessary explanations based on shared semantic aspects.


Knowledge-based Multimodal Music Similarity

arXiv.org Artificial Intelligence

Music similarity is an essential aspect of music retrieval, recommendation systems, and music analysis. Moreover, similarity is of vital interest for music experts, as it allows studying analogies and influences among composers and historical periods. Current approaches to musical similarity rely mainly on symbolic content, which can be expensive to produce and is not always readily available. Conversely, approaches using audio signals typically fail to provide any insight about the reasons behind the observed similarity. This research addresses the limitations of current approaches by focusing on the study of musical similarity using both symbolic and audio content. The aim of this research is to develop a fully explainable and interpretable system that can provide end-users with more control and understanding of music similarity and classification systems.


Handling Wikidata Qualifiers in Reasoning

arXiv.org Artificial Intelligence

Wikidata is a knowledge graph increasingly adopted by many communities for diverse applications. Wikidata statements are annotated with qualifier-value pairs that are used to depict information, such as the validity context of the statement, its causality, provenances, etc. Handling the qualifiers in reasoning is a challenging problem. When defining inference rules (in particular, rules on ontological properties (x subclass of y, z instance of x, etc.)), one must consider the qualifiers, as most of them participate in the semantics of the statements. This poses a complex problem because a) there is a massive number of qualifiers, and b) the qualifiers of the inferred statement are often a combination of the qualifiers in the rule condition. In this work, we propose to address this problem by a) defining a categorization of the qualifiers b) formalizing the Wikidata model with a many-sorted logical language; the sorts of this language are the qualifier categories. We couple this logic with an algebraic specification that provides a means for effectively handling qualifiers in inference rules. Using Wikidata ontological properties, we show how to use the MSL and specification to reason on qualifiers. Finally, we discuss the methodology for practically implementing the work and present a prototype implementation. The work can be naturally extended, thanks to the extensibility of the many-sorted algebraic specification, to cover more qualifiers in the specification, such as uncertain time, recurring events, geographic locations, and others.


Reorganizing Educational Institutional Domain using Faceted Ontological Principles

arXiv.org Artificial Intelligence

The purpose of this work is to find out how different library classification systems and linguistic ontologies arrange a particular domain of interest and what are the limitations for information retrieval. We use knowledge representation techniques and languages for construction of a domain specific ontology. This ontology would help not only in problem solving, but it would demonstrate the ease with which complex queries can be handled using principles of domain ontology, thereby facilitating better information retrieval. Facet-based methodology has been used for ontology formalization for quite some time. Ontology formalization involves different steps such as, Identification of the terminology, Analysis, Synthesis, Standardization and Ordering. Firstly, for purposes of conceptualization OntoUML has been used which is a well-founded and established language for Ontology driven Conceptual Modelling. Phase transformation of "the same mode" has been subsequently obtained by OWL-DL using Protégé software. The final OWL ontology contains a total of around 232 axioms. These axioms comprise 148 logical axioms, 76 declaration axioms and 43 classes.


Process Knowledge-infused Learning for Clinician-friendly Explanations

arXiv.org Artificial Intelligence

Language models have the potential to assess mental health using social media data. By analyzing online posts and conversations, these models can detect patterns indicating mental health conditions like depression, anxiety, or suicidal thoughts. They examine keywords, language markers, and sentiment to gain insights into an individual's mental well-being. This information is crucial for early detection, intervention, and support, improving mental health care and prevention strategies. However, using language models for mental health assessments from social media has two limitations: (1) They do not compare posts against clinicians' diagnostic processes, and (2) It's challenging to explain language model outputs using concepts that the clinician can understand, i.e., clinician-friendly explanations. In this study, we introduce Process Knowledge-infused Learning (PK-iL), a new learning paradigm that layers clinical process knowledge structures on language model outputs, enabling clinician-friendly explanations of the underlying language model predictions. We rigorously test our methods on existing benchmark datasets, augmented with such clinical process knowledge, and release a new dataset for assessing suicidality. PK-iL performs competitively, achieving a 70% agreement with users, while other XAI methods only achieve 47% agreement (average inter-rater agreement of 0.72). Our evaluations demonstrate that PK-iL effectively explains model predictions to clinicians.


Web of Things and Trends in Agriculture: A Systematic Literature Review

arXiv.org Artificial Intelligence

In the past few years, the Web of Things (WOT) became a beneficial game-changing technology within the Agriculture domain as it introduces innovative and promising solutions to the Internet of Things (IoT) agricultural applications problems by providing its services. WOT provides the support for integration, interoperability for heterogeneous devices, infrastructures, platforms, and the emergence of various other technologies. The main aim of this study is about understanding and providing a growing and existing research content, issues, and directions for the future regarding WOT-based agriculture. Therefore, a systematic literature review (SLR) of research articles is presented by categorizing the selected studies published between 2010 and 2020 into the following categories: research type, approaches, and their application domains. Apart from reviewing the state-of-the-art articles on WOT solutions for the agriculture field, a taxonomy of WOT-base agriculture application domains has also been presented in this study. A model has also presented to show the picture of WOT based Smart Agriculture. Lastly, the findings of this SLR and the research gaps in terms of open issues have been presented to provide suggestions on possible future directions for the researchers for future research.


On the Correspondence Between Monotonic Max-Sum GNNs and Datalog

arXiv.org Artificial Intelligence

Although there has been significant interest in applying machine learning techniques to structured data, the expressivity (i.e., a description of what can be learned) of such techniques is still poorly understood. In this paper, we study data transformations based on graph neural networks (GNNs). First, we note that the choice of how a dataset is encoded into a numeric form processable by a GNN can obscure the characterisation of a model's expressivity, and we argue that a canonical encoding provides an appropriate basis. Second, we study the expressivity of monotonic max-sum GNNs, which cover a subclass of GNNs with max and sum aggregation functions. We show that, for each such GNN, one can compute a Datalog program such that applying the GNN to any dataset produces the same facts as a single round of application of the program's rules to the dataset. Monotonic max-sum GNNs can sum an unbounded number of feature vectors which can result in arbitrarily large feature values, whereas rule application requires only a bounded number of constants. Hence, our result shows that the unbounded summation of monotonic max-sum GNNs does not increase their expressive power. Third, we sharpen our result to the subclass of monotonic max GNNs, which use only the max aggregation function, and identify a corresponding class of Datalog programs.


The Ontology for Agents, Systems and Integration of Services: OASIS version 2

arXiv.org Artificial Intelligence

Semantic representation is a key enabler for several application domains, and the multi-agent systems realm makes no exception. Among the methods for semantically representing agents, one has been essentially achieved by taking a behaviouristic vision, through which one can describe how they operate and engage with their peers. The approach essentially aims at defining the operational capabilities of agents through the mental states related with the achievement of tasks. The OASIS ontology -- An Ontology for Agent, Systems, and Integration of Services, presented in 2019 -- pursues the behaviouristic approach to deliver a semantic representation system and a communication protocol for agents and their commitments. This paper reports on the main modeling choices concerning the representation of agents in OASIS 2, the latest major upgrade of OASIS, and the achievement reached by the ontology since it was first introduced, in particular in the context of ontologies for blockchains.


Temporalising Unique Characterisability and Learnability of Ontology-Mediated Queries

arXiv.org Artificial Intelligence

Temporal ontology-mediated query answering provides a framework for accessing temporal data using background knowledge in the form of a logical theory, often called an ontology. Within this framework, queries are usually constructed by combining a domain query language such as conjunctive queries (CQs) with linear temporal logic ( LTL) operators. Ontologies range from pure description logic (DL) ontologies [11, 16], which hold at all timepoints, to suitable combinations of DLs with LTL [7, 9, 28, 10, 48]. On the query side, LTL is combined with domain queries formulated usually as conjunctive queries (CQs). To ensure good semantic and computational properties, one often restricts the use of LTL operators to monotone operators and the domain queries to acyclic CQs such as the class ELIQ of CQs that are equivalent to ELI-queries. On the ontology side, the most popular languages are based on the DL-Lite family of DLs, which underpins the OWL2 DL profile of the W3C standard for ontology-based data access [17, 6]. Our aim in this paper is (i) to find out in how far temporal queries can be (polynomially) characterised using temporal data instances if ontologies encoding background knowledge are present and (ii) to apply the results to study the (polynomial) learnability of temporal queries in Angluin's framework of exact learning [4]. Investigating the computation, size, and shape of unique characterisations of queries by data examples contributes to the query-by-example paradigm in databases [37] as the data examples can be naturally used to illustrate, explain, and construct queries. Unique characterisations also provide a'non-procedural' necessary condition for (polynomial time) exact learnability using membership queries, where membership queries to the oracle take the form'does O,D |= q