Collaborating Authors

Semantic Web

The Semantic Zoo - Smart Data Hubs, Knowledge Graphs and Data Catalogs


Sometimes, you can enter into a technology too early. The groundwork for semantics was laid down in the late 1990s and early 2000s, with Tim Berners-Lee's stellar Semantic Web article, debuting in Scientific American in 2004, seen by many as the movement's birth. Yet many early participants in the field of semantics discovered a harsh reality: computer systems were too slow to handle the intense indexing requirements the technology needed, the original specifications and APIs failed to handle important edge cases, and, perhaps most importantly, the number of real world use cases where semantics made sense were simply not at a large enough scope; they could easily be met by existing approaches and technology. Semantics faded around 2008, echoing the pattern of the Artificial Intelligence Winter of the 1970s. JSON was all the rage, then mobile apps, big data came on the scene even as Javascript underwent a radical transformation, and all of a sudden everyone wanted to be a data scientist (until they discovered the fact that data science was mostly math).

Semantic interoperability in IoT


Semantic interoperability includes the ability to establish a shared meaning of the data exchanged, as well as the ability to similarly interpret communication interfaces. Shared meaning here means that two different computer systems, for example, not only can communicate data in the basic sense (such as an integer with value 42), but also attach unambiguous meaning to the data. For example, radiator three's temperature in the conference room on level five is currently 42 Celsius. As we build large IoT systems we are faced with several challenges of scale. Among them is the challenge of being able to make equipment and subsystems of different vendors interoperable and, over different time periods, work together and as intended.

A Newbie's Guide to the Semantic Web


When I started learning about the semantic web, it was quite foreign territory and the practitioners all seemed to be talking over my head, so when I began to figure it out, I thought it would be valuable to write an introduction for those interested but a little put off. Well it's a whole bunch of things stitched together with many tools and different technologies and standards. Let's start with the problem that the semantic web is trying to solve. Microsoft explained it very well with its Bing commercials on search overload. Not that Bing solves it, but at least Microsoft is good at explaining the problem.

AtScale expands COVID-19 data semantic model


The intersection of the COVID-19 pandemic and analytics has been in focus almost since the pandemic began. Organizations like Johns Hopkins Center for Systems Science and Engineering (CSSE), the New York Times and many governments, including states and municipalities in the US, have been publishing data around a number of indicators, including case counts, hospitalizations, deaths and rates of positive testing. The data sets are downloadable in open formats, and available for self-service analysis. But with so many datasets, new circumstances like in-progress re-openings and new spikes in infection, what's the best way really to make sense of the data? And what other data, not specific to Coronoavirus/COVID-19, might be useful and germane?

An Empirical Meta-analysis of the Life Sciences (Linked?) Open Data on the Web Artificial Intelligence

While the biomedical community has published several "open data" sources in the last decade, most researchers still endure severe logistical and technical challenges to discover, query, and integrate heterogeneous data and knowledge from multiple sources. To tackle these challenges, the community has experimented with Semantic Web and linked data technologies to create the Life Sciences Linked Open Data (LSLOD) cloud. In this paper, we extract schemas from more than 80 publicly available biomedical linked data graphs into an LSLOD schema graph and conduct an empirical meta-analysis to evaluate the extent of semantic heterogeneity across the LSLOD cloud. We observe that several LSLOD sources exist as stand-alone data sources that are not inter-linked with other sources, use unpublished schemas with minimal reuse or mappings, and have elements that are not useful for data integration from a biomedical perspective. We envision that the LSLOD schema graph and the findings from this research will aid researchers who wish to query and integrate data and knowledge from multiple biomedical sources simultaneously on the Web.

Relational Learning Analysis of Social Politics using Knowledge Graph Embedding Artificial Intelligence

Knowledge Graphs (KGs) have gained considerable attention recently from both academia and industry. In fact, incorporating graph technology and the copious of various graph datasets have led the research community to build sophisticated graph analytics tools. Therefore, the application of KGs has extended to tackle a plethora of real-life problems in dissimilar domains. Despite the abundance of the currently proliferated generic KGs, there is a vital need to construct domain-specific KGs. Further, quality and credibility should be assimilated in the process of constructing and augmenting KGs, particularly those propagated from mixed-quality resources such as social media data. This paper presents a novel credibility domain-based KG Embedding framework. This framework involves capturing a fusion of data obtained from heterogeneous resources into a formal KG representation depicted by a domain ontology. The proposed approach makes use of various knowledge-based repositories to enrich the semantics of the textual contents, thereby facilitating the interoperability of information. The proposed framework also embodies a credibility module to ensure data quality and trustworthiness. The constructed KG is then embedded in a low-dimension semantically-continuous space using several embedding techniques. The utility of the constructed KG and its embeddings is demonstrated and substantiated on link prediction, clustering, and visualisation tasks.

NEMA: Automatic Integration of Large Network Management Databases Artificial Intelligence

Network management, whether for malfunction analysis, failure prediction, performance monitoring and improvement, generally involves large amounts of data from different sources. To effectively integrate and manage these sources, automatically finding semantic matches among their schemas or ontologies is crucial. Existing approaches on database matching mainly fall into two categories. One focuses on the schema-level matching based on schema properties such as field names, data types, constraints and schema structures. Network management databases contain massive tables (e.g., network products, incidents, security alert and logs) from different departments and groups with nonuniform field names and schema characteristics. It is not reliable to match them by those schema properties. The other category is based on the instance-level matching using general string similarity techniques, which are not applicable for the matching of large network management databases. In this paper, we develop a matching technique for large NEtwork MAnagement databases (NEMA) deploying instance-level matching for effective data integration and connection. We design matching metrics and scores for both numerical and non-numerical fields and propose algorithms for matching these fields. The effectiveness and efficiency of NEMA are evaluated by conducting experiments based on ground truth field pairs in large network management databases. Our measurement on large databases with 1,458 fields, each of which contains over 10 million records, reveals that the accuracies of NEMA are up to 95%. It achieves 2%-10% higher accuracy and 5x-14x speedup over baseline methods.

R2RML and RML Comparison for RDF Generation, their Rules Validation and Inconsistency Resolution Artificial Intelligence

In this paper, an overview of the state of the art on knowledge graph generation is provided, with focus on the two prevalent mapping languages: the W3C recommended R2RML and its generalisation RML. We look into details on their differences and explain how knowledge graphs, in the form of RDF graphs, can be generated with each one of the two mapping languages. Then we assess if the vocabulary terms were properly applied to the data and no violations occurred on their use, either using R2RML or RML to generate the desired knowledge graph.

Event-QA: A Dataset for Event-Centric Question Answering over Knowledge Graphs Artificial Intelligence

Semantic Question Answering (QA) is the key technology to facilitate intuitive user access to semantic information stored in knowledge graphs. Whereas most of the existing QA systems and datasets focus on entity-centric questions, very little is known about the performance of these systems in the context of events. As new event-centric knowledge graphs emerge, datasets for such questions gain importance. In this paper we present the Event-QA dataset for answering event-centric questions over knowledge graphs. Event-QA contains 1000 semantic queries and the corresponding English, German and Portuguese verbalisations for EventKG - a recently proposed event-centric knowledge graph with over 970 thousand events.

Orchestrating NLP Services for the Legal Domain Artificial Intelligence

Legal technology is currently receiving a lot of attention from various angles. In this contribution we describe the main technical components of a system that is currently under development in the European innovation project Lynx, which includes partners from industry and research. The key contribution of this paper is a workflow manager that enables the flexible orchestration of workflows based on a portfolio of Natural Language Processing and Content Curation services as well as a Multilingual Legal Knowledge Graph that contains semantic information and meaningful references to legal documents. We also describe different use cases with which we experiment and develop prototypical solutions.