The World Wide Web changed the way we live our lives, most notably in the ways we now share, consume and find information. There are many more webpages now than there are people, and links connect these webpages to each other in a giant network that is accessible from your favorite browser.
A downside of this success is that now there’s too much information, so much in fact, that we need machines to intelligently read these webpages and answer our questions. The Semantic Web is a movement and research community that brings together experts from different areas, examples being natural language processing, ontologies, databases, social media, networks and logic, to realize the vision of making the Web machine-readable.
Why is this such a difficult problem? The main reason is that much of the Web, even today, is in a natural language like English or French. These languages are very ambiguous, but we humans have a knack for understanding them due to a variety of factors, not the least of which is our immense store of background knowledge and common sense. Machines are not yet capable of understanding English at the same level as an adult human being, though impressive progress is being made.
To overcome this problem, the Semantic Web presents a vision of the Web as an interlinked network of concepts, relationships and entities, rather than an interlinked network of ‘natural’ webpages. Intelligent systems, often called ‘agents’, can consume the Semantic Web and answer complex questions that now require human labor. The research in the Semantic Web also helps search; e.g. the Google Knowledge Graph, which uses Semantic Web technology, can help you to answer some of your questions without even clicking on a link!
Machine learning methods especially deep neural networks have achieved great success but many of them often rely on a number of labeled samples for training. In real-world applications, we often need to address sample shortage due to e.g., dynamic contexts with emerging prediction targets and costly sample annotation. Therefore, low-resource learning, which aims to learn robust prediction models with no enough resources (especially training samples), is now being widely investigated. Among all the low-resource learning studies, many prefer to utilize some auxiliary information in the form of Knowledge Graph (KG), which is becoming more and more popular for knowledge representation, to reduce the reliance on labeled samples. In this survey, we very comprehensively reviewed over $90$ papers about KG-aware research for two major low-resource learning settings -- zero-shot learning (ZSL) where new classes for prediction have never appeared in training, and few-shot learning (FSL) where new classes for prediction have only a small number of labeled samples that are available. We first introduced the KGs used in ZSL and FSL studies as well as the existing and potential KG construction solutions, and then systematically categorized and summarized KG-aware ZSL and FSL methods, dividing them into different paradigms such as the mapping-based, the data augmentation, the propagation-based and the optimization-based. We next presented different applications, including not only KG augmented tasks in Computer Vision and Natural Language Processing (e.g., image classification, text classification and knowledge extraction), but also tasks for KG curation (e.g., inductive KG completion), and some typical evaluation resources for each task. We eventually discussed some challenges and future directions on aspects such as new learning and reasoning paradigms, and the construction of high quality KGs.
Narrative cartography is a discipline which studies the interwoven nature of stories and maps. However, conventional geovisualization techniques of narratives often encounter several prominent challenges, including the data acquisition & integration challenge and the semantic challenge. To tackle these challenges, in this paper, we propose the idea of narrative cartography with knowledge graphs (KGs). Firstly, to tackle the data acquisition & integration challenge, we develop a set of KG-based GeoEnrichment toolboxes to allow users to search and retrieve relevant data from integrated cross-domain knowledge graphs for narrative mapping from within a GISystem. With the help of this tool, the retrieved data from KGs are directly materialized in a GIS format which is ready for spatial analysis and mapping. Two use cases - Magellan's expedition and World War II - are presented to show the effectiveness of this approach. In the meantime, several limitations are identified from this approach, such as data incompleteness, semantic incompatibility, and the semantic challenge in geovisualization. For the later two limitations, we propose a modular ontology for narrative cartography, which formalizes both the map content (Map Content Module) and the geovisualization process (Cartography Module). We demonstrate that, by representing both the map content and the geovisualization process in KGs (an ontology), we can realize both data reusability and map reproducibility for narrative cartography.
The Linked Open Data practice has led to a significant growth of structured data on the Web in the last decade. Such structured data describe real-world entities in a machine-readable way, and have created an unprecedented opportunity for research in the field of Natural Language Processing. However, there is a lack of studies on how such data can be used, for what kind of tasks, and to what extent they can be useful for these tasks. This work focuses on the e-commerce domain to explore methods of utilising such structured data to create language resources that may be used for product classification and linking. We process billions of structured data points in the form of RDF n-quads, to create multi-million words of product-related corpora that are later used in three different ways for creating of language resources: training word embedding models, continued pre-training of BERT-like language models, and training Machine Translation models that are used as a proxy to generate product-related keywords. Our evaluation on an extensive set of benchmarks shows word embeddings to be the most reliable and consistent method to improve the accuracy on both tasks (with up to 6.9 percentage points in macro-average F1 on some datasets). The other two methods however, are not as useful. Our analysis shows that this could be due to a number of reasons, including the biased domain representation in the structured data and lack of vocabulary coverage. We share our datasets and discuss how our lessons learned could be taken forward to inform future research in this direction.
Abbas, Nacira, Alghamdi, Kholoud, Alinam, Mortaza, Alloatti, Francesca, Amaral, Glenda, d'Amato, Claudia, Asprino, Luigi, Beno, Martin, Bensmann, Felix, Biswas, Russa, Cai, Ling, Capshaw, Riley, Carriero, Valentina Anita, Celino, Irene, Dadoun, Amine, De Giorgis, Stefano, Delva, Harm, Domingue, John, Dumontier, Michel, Emonet, Vincent, van Erp, Marieke, Arias, Paola Espinoza, Fallatah, Omaima, Ferrada, Sebastián, Ocaña, Marc Gallofré, Georgiou, Michalis, Gesese, Genet Asefa, Gillis-Webber, Frances, Giovannetti, Francesca, Buey, Marìa Granados, Harrando, Ismail, Heibi, Ivan, Horta, Vitor, Huber, Laurine, Igne, Federico, Jaradeh, Mohamad Yaser, Keshan, Neha, Koleva, Aneta, Koteich, Bilal, Kurniawan, Kabul, Liu, Mengya, Ma, Chuangtao, Maas, Lientje, Mansfield, Martin, Mariani, Fabio, Marzi, Eleonora, Mesbah, Sepideh, Mistry, Maheshkumar, Tirado, Alba Catalina Morales, Nguyen, Anna, Nguyen, Viet Bach, Oelen, Allard, Pasqual, Valentina, Paulheim, Heiko, Polleres, Axel, Porena, Margherita, Portisch, Jan, Presutti, Valentina, Pustu-Iren, Kader, Mendez, Ariam Rivas, Roshankish, Soheil, Rudolph, Sebastian, Sack, Harald, Sakor, Ahmad, Salas, Jaime, Schleider, Thomas, Shi, Meilin, Spinaci, Gianmarco, Sun, Chang, Tietz, Tabea, Dhouib, Molka Tounsi, Umbrico, Alessandro, Berg, Wouter van den, Xu, Weiqin
One of the grand challenges discussed during the Dagstuhl Seminar "Knowledge Graphs: New Directions for Knowledge Representation on the Semantic Web" and described in its report is that of a: "Public FAIR Knowledge Graph of Everything: We increasingly see the creation of knowledge graphs that capture information about the entirety of a class of entities. [...] This grand challenge extends this further by asking if we can create a knowledge graph of "everything" ranging from common sense concepts to location based entities. This knowledge graph should be "open to the public" in a FAIR manner democratizing this mass amount of knowledge." Although linked open data (LOD) is one knowledge graph, it is the closest realisation (and probably the only one) to a public FAIR Knowledge Graph (KG) of everything. Surely, LOD provides a unique testbed for experimenting and evaluating research hypotheses on open and FAIR KG. One of the most neglected FAIR issues about KGs is their ongoing evolution and long term preservation. We want to investigate this problem, that is to understand what preserving and supporting the evolution of KGs means and how these problems can be addressed. Clearly, the problem can be approached from different perspectives and may require the development of different approaches, including new theories, ontologies, metrics, strategies, procedures, etc. This document reports a collaborative effort performed by 9 teams of students, each guided by a senior researcher as their mentor, attending the International Semantic Web Research School (ISWS 2019). Each team provides a different perspective to the problem of knowledge graph evolution substantiated by a set of research questions as the main subject of their investigation. In addition, they provide their working definition for KG preservation and evolution.
Legal technology is currently receiving a lot of attention from various angles. In this contribution we describe the main technical components of a system that is currently under development in the European innovation project Lynx, which includes partners from industry and research. The key contribution of this paper is a workflow manager that enables the flexible orchestration of workflows based on a portfolio of Natural Language Processing and Content Curation services as well as a Multilingual Legal Knowledge Graph that contains semantic information and meaningful references to legal documents. We also describe different use cases with which we experiment and develop prototypical solutions.
Hogan, Aidan, Blomqvist, Eva, Cochez, Michael, d'Amato, Claudia, de Melo, Gerard, Gutierrez, Claudio, Gayo, José Emilio Labra, Kirrane, Sabrina, Neumaier, Sebastian, Polleres, Axel, Navigli, Roberto, Ngomo, Axel-Cyrille Ngonga, Rashid, Sabbir M., Rula, Anisa, Schmelzeisen, Lukas, Sequeda, Juan, Staab, Steffen, Zimmermann, Antoine
In this paper we provide a comprehensive introduction to knowledge graphs, which have recently garnered significant attention from both industry and academia in scenarios that require exploiting diverse, dynamic, large-scale collections of data. After a general introduction, we motivate and contrast various graph-based data models and query languages that are used for knowledge graphs. We discuss the roles of schema, identity, and context in knowledge graphs. We explain how knowledge can be represented and extracted using a combination of deductive and inductive techniques. We summarise methods for the creation, enrichment, quality assessment, refinement, and publication of knowledge graphs. We provide an overview of prominent open knowledge graphs and enterprise knowledge graphs, their applications, and how they use the aforementioned techniques. We conclude with high-level future research directions for knowledge graphs.
Digital agriculture increasingly relies on the generation of large quantity of images. These images are processed with machine learning techniques to speed up the identification of objects, their classification, visualization, and interpretation. However, images must comply with the FAIR principles to facilitate their access, reuse, and interoperability. As stated in recent paper authored by the Planteome team (Trigkakis et al, 2018), "Plant researchers could benefit greatly from a trained classification model that predicts image annotations with a high degree of accuracy." In this third Ontologies Community of Practice webinar, Justin Preece, Senior Faculty Research Assistant Oregon State University, presents the module developed by the Planteome project using the Bio-Image Semantic Query User Environment (BISQUE), an online image analysis and storage platform of Cyverse.
We present the state of the art in representing and reasoning with fuzzy knowledge in Semantic Web Languages such as triple languages RDF/RDFS, conceptual languages of the OWL 2 family and rule languages. We further show how one may generalise them to so-called annotation domains, that cover also e.g.
It has been argued that it is much easier to convey logical statements using rules rather than OWL (or description logic (DL)) axioms. Based on recent theoretical developments on transformations between rules and DLs, we have developed ROWLTab, a Protege plugin that allows users to enter OWL axioms by way of rules; the plugin then automatically converts these rules into OWL 2 DL axioms if possible, and prompts the user in case such a conversion is not possible without weakening the semantics of the rule. In this paper, we present ROWLTab, together with a user evaluation of its effectiveness compared to entering axioms using the standard Protege interface. Our evaluation shows that modeling with ROWLTab is much quicker than the standard interface, while at the same time, also less prone to errors for hard modeling tasks.