Goto

Collaborating Authors

 Ontologies


Matching with Transformers in MELT

arXiv.org Artificial Intelligence

One of the strongest signals for automated matching of ontologies and knowledge graphs are the textual descriptions of the concepts. The methods that are typically applied (such as character- or token-based comparisons) are relatively simple, and therefore do not capture the actual meaning of the texts. With the rise of transformer-based language models, text comparison based on meaning (rather than lexical features) is possible. In this paper, we model the ontology matching task as classification problem and present approaches based on transformer models. We further provide an easy to use implementation in the MELT framework which is suited for ontology and knowledge graph matching. We show that a transformer-based filter helps to choose the correct correspondences given a high-recall alignment and already achieves a good result with simple alignment post-processing methods.


An Ontology-Based Information Extraction System for Residential Land Use Suitability Analysis

arXiv.org Artificial Intelligence

We propose an Ontology-Based Information Extraction (OBIE) system to automate the extraction of the criteria and values applied in Land Use Suitability Analysis (LUSA) from bylaw and regulation documents related to the geographic area of interest. The results obtained by our proposed LUSA OBIE system (land use suitability criteria and their values) are presented as an ontology populated with instances of the extracted criteria and property values. This latter output ontology is incorporated into a Multi-Criteria Decision Making (MCDM) model applied for constructing suitability maps for different kinds of land uses. The resulting maps may be the final desired product or can be incorporated into the cellular automata urban modeling and simulation for predicting future urban growth. A case study has been conducted where the output from LUSA OBIE is applied to help produce a suitability map for the City of Regina, Saskatchewan, to assist in the identification of suitable areas for residential development. A set of Saskatchewan bylaw and regulation documents were downloaded and input to the LUSA OBIE system. We accessed the extracted information using both the populated LUSA ontology and the set of annotated documents. In this regard, the LUSA OBIE system was effective in producing a final suitability map.


Network representation learning systematic review: ancestors and current development state

arXiv.org Artificial Intelligence

Real-world information networks are increasingly occurring across various disciplines including online social networks and citation networks. These network data are generally characterized by sparseness, nonlinearity and heterogeneity bringing different challenges to the network analytics task to capture inherent properties from network data. Artificial intelligence and machine learning have been recently leveraged as powerful systems to learn insights from network data and deal with presented challenges. As part of machine learning techniques, graph embedding approaches are originally conceived for graphs constructed from feature represented datasets, like image dataset, in which links between nodes are explicitly defined. These traditional approaches cannot cope with network data challenges. As a new learning paradigm, network representation learning has been proposed to map a real-world information network into a low-dimensional space while preserving inherent properties of the network. In this paper, we present a systematic comprehensive survey of network representation learning, known also as network embedding, from birth to the current development state. Through the undertaken survey, we provide a comprehensive view of reasons behind the emergence of network embedding and, types of settings and models used in the network embedding pipeline. Thus, we introduce a brief history of representation learning and word representation learning ancestor of network embedding. We provide also formal definitions of basic concepts required to understand network representation learning followed by a description of network embedding pipeline. Most commonly used downstream tasks to evaluate embeddings, their evaluation metrics and popular datasets are highlighted. Finally, we present the open-source libraries for network embedding.


The Pros and Cons of RDF-Star and Sparql-Star

#artificialintelligence

For regular readers of the (lately somewhat irregularly published) The Cagle Report, I've finally managed to get my feet underneath me at Data Science Central, and am gearing up with a number of new initiatives, including a video interview program that I'm getting underway as soon as I can get the last of the physical infrastructure (primarily some lighting and a decent green screen) in place. I recently purchased a new laptop one with enough speed and space to let me do any number of projects that my nearly four-year-old workhorse was just not equipped to handle. One of those projects was to start going through the dominant triple stores and explore them in greater depth as part of a general evaluation I hope to complete later in the year. The latest Ontotext GraphDB (9.7.0) had been on my list for a while, and I was generally surprised and pleased by what I found there, especially as I'd worked with older versions of GraphDB and found it useful but not quite there. These four items have become what I consider essential technologies for a W3C stack triple store to fully implement.


Performance Optimization and Risk Management for Trading

#artificialintelligence

Generate Income and make a living with Day Trading / Algorithmic Trading. This unique course provides the skills, knowledge, and techniques required to (realistically!) answer that question. The course uses rigorous quantitative methods and is 100% data-driven (Python coding required!).


Ontology-driven Knowledge Graph for Android Malware

arXiv.org Artificial Intelligence

We present MalONT2.0 -- an ontology for malware threat intelligence \cite{rastogi2020malont}. New classes (attack patterns, infrastructural resources to enable attacks, malware analysis to incorporate static analysis, and dynamic analysis of binaries) and relations have been added following a broadened scope of core competency questions. MalONT2.0 allows researchers to extensively capture all requisite classes and relations that gather semantic and syntactic characteristics of an android malware attack. This ontology forms the basis for the malware threat intelligence knowledge graph, MalKG, which we exemplify using three different, non-overlapping demonstrations. Malware features have been extracted from CTI reports on android threat intelligence shared on the Internet and written in the form of unstructured text. Some of these sources are blogs, threat intelligence reports, tweets, and news articles. The smallest unit of information that captures malware features is written as triples comprising head and tail entities, each connected with a relation. In the poster and demonstration, we discuss MalONT2.0, MalKG, as well as the dynamically growing knowledge graph, TINKER.


An Exploratory Study on Utilising the Web of Linked Data for Product Data Mining

arXiv.org Artificial Intelligence

The Linked Open Data practice has led to a significant growth of structured data on the Web in the last decade. Such structured data describe real-world entities in a machine-readable way, and have created an unprecedented opportunity for research in the field of Natural Language Processing. However, there is a lack of studies on how such data can be used, for what kind of tasks, and to what extent they can be useful for these tasks. This work focuses on the e-commerce domain to explore methods of utilising such structured data to create language resources that may be used for product classification and linking. We process billions of structured data points in the form of RDF n-quads, to create multi-million words of product-related corpora that are later used in three different ways for creating of language resources: training word embedding models, continued pre-training of BERT-like language models, and training Machine Translation models that are used as a proxy to generate product-related keywords. Our evaluation on an extensive set of benchmarks shows word embeddings to be the most reliable and consistent method to improve the accuracy on both tasks (with up to 6.9 percentage points in macro-average F1 on some datasets). The other two methods however, are not as useful. Our analysis shows that this could be due to a number of reasons, including the biased domain representation in the structured data and lack of vocabulary coverage. We share our datasets and discuss how our lessons learned could be taken forward to inform future research in this direction.


Artificial Intelligence Algorithms for Natural Language Processing and the Semantic Web Ontology Learning

arXiv.org Artificial Intelligence

Evolutionary clustering algorithms have considered as the most popular and widely used evolutionary algorithms for minimising optimisation and practical problems in nearly all fields. In this thesis, a new evolutionary clustering algorithm star (ECA*) is proposed. Additionally, a number of experiments were conducted to evaluate ECA* against five state-of-the-art approaches. For this, 32 heterogeneous and multi-featured datasets were used to examine their performance using internal and external clustering measures, and to measure the sensitivity of their performance towards dataset features in the form of operational framework. The results indicate that ECA* overcomes its competitive techniques in terms of the ability to find the right clusters. Based on its superior performance, exploiting and adapting ECA* on the ontology learning had a vital possibility. In the process of deriving concept hierarchies from corpora, generating formal context may lead to a time-consuming process. Therefore, formal context size reduction results in removing uninterested and erroneous pairs, taking less time to extract the concept lattice and concept hierarchies accordingly. In this premise, this work aims to propose a framework to reduce the ambiguity of the formal context of the existing framework using an adaptive version of ECA*. In turn, an experiment was conducted by applying 385 sample corpora from Wikipedia on the two frameworks to examine the reduction of formal context size, which leads to yield concept lattice and concept hierarchy. The resulting lattice of formal context was evaluated to the original one using concept lattice-invariants. Accordingly, the homomorphic between the two lattices preserves the quality of resulting concept hierarchies by 89% in contrast to the basic ones, and the reduced concept lattice inherits the structural relation of the original one.


Satisfiability and Containment of Recursive SHACL

arXiv.org Artificial Intelligence

The Shapes Constraint Language (SHACL) is the recent W3C recommendation language for validating RDF data, by verifying certain shapes on graphs. Previous work has largely focused on the validation problem and the standard decision problems of satisfiability and containment, crucial for design and optimisation purposes, have only been investigated for simplified versions of SHACL. Moreover, the SHACL specification does not define the semantics of recursively-defined constraints, which led to several alternative recursive semantics being proposed in the literature. The interaction between these different semantics and important decision problems has not been investigated yet. In this article we provide a comprehensive study of the different features of SHACL, by providing a translation to a new first-order language, called SCL, that precisely captures the semantics of SHACL. We also present MSCL, a second-order extension of SCL, which allows us to define, in a single formal logic framework, the main recursive semantics of SHACL. Within this language we also provide an effective treatment of filter constraints which are often neglected in the related literature. Using this logic we provide a detailed map of (un)decidability and complexity results for the satisfiability and containment decision problems for different SHACL fragments. Notably, we prove that both problems are undecidable for the full language, but we present decidable combinations of interesting features, even in the face of recursion.


Cleaning Inconsistent Data in Temporal DL-Lite Under Best Repair Semantics

arXiv.org Artificial Intelligence

In this paper, we address the problem of handling inconsistent data in Temporal Description Logic (TDL) knowledge bases. Considering the data part of the knowledge base as the source of inconsistency over time, we propose an ABox repair approach. This is the first work handling the repair in TDL Knowledge bases. To do so, our goal is twofold: 1) detect temporal inconsistencies and 2) propose a data temporal reparation. For the inconsistency detection, we propose a reduction approach from TDL to DL which allows to provide a tight NP-complete upper bound for TDL concept satisfiability and to use highly optimised DL reasoners that can bring precise explanation (the set of inconsistent data assertions). Thereafter, from the obtained explanation, we propose a method for automatically computing the best repair in the temporal setting based on the allowed rigid predicates and the time order of assertions.