The World Wide Web changed the way we live our lives, most notably in the ways we now share, consume and find information. There are many more webpages now than there are people, and links connect these webpages to each other in a giant network that is accessible from your favorite browser.
A downside of this success is that now there’s too much information, so much in fact, that we need machines to intelligently read these webpages and answer our questions. The Semantic Web is a movement and research community that brings together experts from different areas, examples being natural language processing, ontologies, databases, social media, networks and logic, to realize the vision of making the Web machine-readable.
Why is this such a difficult problem? The main reason is that much of the Web, even today, is in a natural language like English or French. These languages are very ambiguous, but we humans have a knack for understanding them due to a variety of factors, not the least of which is our immense store of background knowledge and common sense. Machines are not yet capable of understanding English at the same level as an adult human being, though impressive progress is being made.
To overcome this problem, the Semantic Web presents a vision of the Web as an interlinked network of concepts, relationships and entities, rather than an interlinked network of ‘natural’ webpages. Intelligent systems, often called ‘agents’, can consume the Semantic Web and answer complex questions that now require human labor. The research in the Semantic Web also helps search; e.g. the Google Knowledge Graph, which uses Semantic Web technology, can help you to answer some of your questions without even clicking on a link!
Semantic interoperability includes the ability to establish a shared meaning of the data exchanged, as well as the ability to similarly interpret communication interfaces. Shared meaning here means that two different computer systems, for example, not only can communicate data in the basic sense (such as an integer with value 42), but also attach unambiguous meaning to the data. For example, radiator three's temperature in the conference room on level five is currently 42 Celsius. As we build large IoT systems we are faced with several challenges of scale. Among them is the challenge of being able to make equipment and subsystems of different vendors interoperable and, over different time periods, work together and as intended.
When I started learning about the semantic web, it was quite foreign territory and the practitioners all seemed to be talking over my head, so when I began to figure it out, I thought it would be valuable to write an introduction for those interested but a little put off. Well it's a whole bunch of things stitched together with many tools and different technologies and standards. Let's start with the problem that the semantic web is trying to solve. Microsoft explained it very well with its Bing commercials on search overload. Not that Bing solves it, but at least Microsoft is good at explaining the problem.
The intersection of the COVID-19 pandemic and analytics has been in focus almost since the pandemic began. Organizations like Johns Hopkins Center for Systems Science and Engineering (CSSE), the New York Times and many governments, including states and municipalities in the US, have been publishing data around a number of indicators, including case counts, hospitalizations, deaths and rates of positive testing. The data sets are downloadable in open formats, and available for self-service analysis. But with so many datasets, new circumstances like in-progress re-openings and new spikes in infection, what's the best way really to make sense of the data? And what other data, not specific to Coronoavirus/COVID-19, might be useful and germane?
While the biomedical community has published several "open data" sources in the last decade, most researchers still endure severe logistical and technical challenges to discover, query, and integrate heterogeneous data and knowledge from multiple sources. To tackle these challenges, the community has experimented with Semantic Web and linked data technologies to create the Life Sciences Linked Open Data (LSLOD) cloud. In this paper, we extract schemas from more than 80 publicly available biomedical linked data graphs into an LSLOD schema graph and conduct an empirical meta-analysis to evaluate the extent of semantic heterogeneity across the LSLOD cloud. We observe that several LSLOD sources exist as stand-alone data sources that are not inter-linked with other sources, use unpublished schemas with minimal reuse or mappings, and have elements that are not useful for data integration from a biomedical perspective. We envision that the LSLOD schema graph and the findings from this research will aid researchers who wish to query and integrate data and knowledge from multiple biomedical sources simultaneously on the Web.
Knowledge Graphs (KGs) have gained considerable attention recently from both academia and industry. In fact, incorporating graph technology and the copious of various graph datasets have led the research community to build sophisticated graph analytics tools. Therefore, the application of KGs has extended to tackle a plethora of real-life problems in dissimilar domains. Despite the abundance of the currently proliferated generic KGs, there is a vital need to construct domain-specific KGs. Further, quality and credibility should be assimilated in the process of constructing and augmenting KGs, particularly those propagated from mixed-quality resources such as social media data. This paper presents a novel credibility domain-based KG Embedding framework. This framework involves capturing a fusion of data obtained from heterogeneous resources into a formal KG representation depicted by a domain ontology. The proposed approach makes use of various knowledge-based repositories to enrich the semantics of the textual contents, thereby facilitating the interoperability of information. The proposed framework also embodies a credibility module to ensure data quality and trustworthiness. The constructed KG is then embedded in a low-dimension semantically-continuous space using several embedding techniques. The utility of the constructed KG and its embeddings is demonstrated and substantiated on link prediction, clustering, and visualisation tasks.
Network management, whether for malfunction analysis, failure prediction, performance monitoring and improvement, generally involves large amounts of data from different sources. To effectively integrate and manage these sources, automatically finding semantic matches among their schemas or ontologies is crucial. Existing approaches on database matching mainly fall into two categories. One focuses on the schema-level matching based on schema properties such as field names, data types, constraints and schema structures. Network management databases contain massive tables (e.g., network products, incidents, security alert and logs) from different departments and groups with nonuniform field names and schema characteristics. It is not reliable to match them by those schema properties. The other category is based on the instance-level matching using general string similarity techniques, which are not applicable for the matching of large network management databases. In this paper, we develop a matching technique for large NEtwork MAnagement databases (NEMA) deploying instance-level matching for effective data integration and connection. We design matching metrics and scores for both numerical and non-numerical fields and propose algorithms for matching these fields. The effectiveness and efficiency of NEMA are evaluated by conducting experiments based on ground truth field pairs in large network management databases. Our measurement on large databases with 1,458 fields, each of which contains over 10 million records, reveals that the accuracies of NEMA are up to 95%. It achieves 2%-10% higher accuracy and 5x-14x speedup over baseline methods.
In this paper, an overview of the state of the art on knowledge graph generation is provided, with focus on the two prevalent mapping languages: the W3C recommended R2RML and its generalisation RML. We look into details on their differences and explain how knowledge graphs, in the form of RDF graphs, can be generated with each one of the two mapping languages. Then we assess if the vocabulary terms were properly applied to the data and no violations occurred on their use, either using R2RML or RML to generate the desired knowledge graph.
Semantic Question Answering (QA) is the key technology to facilitate intuitive user access to semantic information stored in knowledge graphs. Whereas most of the existing QA systems and datasets focus on entity-centric questions, very little is known about the performance of these systems in the context of events. As new event-centric knowledge graphs emerge, datasets for such questions gain importance. In this paper we present the Event-QA dataset for answering event-centric questions over knowledge graphs. Event-QA contains 1000 semantic queries and the corresponding English, German and Portuguese verbalisations for EventKG - a recently proposed event-centric knowledge graph with over 970 thousand events.
Legal technology is currently receiving a lot of attention from various angles. In this contribution we describe the main technical components of a system that is currently under development in the European innovation project Lynx, which includes partners from industry and research. The key contribution of this paper is a workflow manager that enables the flexible orchestration of workflows based on a portfolio of Natural Language Processing and Content Curation services as well as a Multilingual Legal Knowledge Graph that contains semantic information and meaningful references to legal documents. We also describe different use cases with which we experiment and develop prototypical solutions.