The World Wide Web changed the way we live our lives, most notably in the ways we now share, consume and find information. There are many more webpages now than there are people, and links connect these webpages to each other in a giant network that is accessible from your favorite browser.
A downside of this success is that now there’s too much information, so much in fact, that we need machines to intelligently read these webpages and answer our questions. The Semantic Web is a movement and research community that brings together experts from different areas, examples being natural language processing, ontologies, databases, social media, networks and logic, to realize the vision of making the Web machine-readable.
Why is this such a difficult problem? The main reason is that much of the Web, even today, is in a natural language like English or French. These languages are very ambiguous, but we humans have a knack for understanding them due to a variety of factors, not the least of which is our immense store of background knowledge and common sense. Machines are not yet capable of understanding English at the same level as an adult human being, though impressive progress is being made.
To overcome this problem, the Semantic Web presents a vision of the Web as an interlinked network of concepts, relationships and entities, rather than an interlinked network of ‘natural’ webpages. Intelligent systems, often called ‘agents’, can consume the Semantic Web and answer complex questions that now require human labor. The research in the Semantic Web also helps search; e.g. the Google Knowledge Graph, which uses Semantic Web technology, can help you to answer some of your questions without even clicking on a link!
MetaShare is a knowledge-based system that supports the creation of data management plans and provides the functionality to support researchers as they implement those plans. MetaShare is a community-based, user-driven system that is being designed around the parallels of the scientific data life cycle and the development cycle of knowledge-based systems. MetaShare will provide recommendations and guidance to researchers based on the practices and decisions of similar projects. Using formal knowledge representation in the form of ontologies and rules, the system will be able to generate data collection, dissemination, and management tools to facilitate tasks with respect to using and sharing scientific data. MetaShare, which is initially targeting the research community at the University of Texas at El Paso, is being developed on a Web platform, using Semantic Web technologies. This paper presents a roadmap for the development of MetaShare, justifying the functionality and implementation decisions. In addition, the paper presents an argument concerning the return on investment for researchers and the planned evaluation for the system.
In this paper, we describe the approach of the Earth, Life and Semantic Web (ELSEWeb) project that facilitates the discovery and transformation of Earth observation data sources for the creation of species distribution models (data-to-model) transformations. ELSEWeb automates the discovery and processing of voluminous, heterogeneous satellite imagery and other geospatial data available at the Earth Data Analysis Center to be included in Lifemapper Species Distribution models by using AI knowledge representation and reasoning techniques developed by the Semantic Web community. The realization of the ELSEWeb semantic infrastructure provides the possibility of combinatoric explosions of scientific results, automatically generated by orchestrations of data mash-ups and service composition. We report on the key elements that contributed to the ELSEWeb project and the role of automated reasoning in streamlining the Species Distribution Model generation and execution.
Abstracts of the invited talks presented at the AAAI Fall Symposium on Discovery Informatics: AI Takes a Science-Centered View on Big Data. Talks include A Data Lifecycle Approach to Discovery Informatics, Generating Biomedical Hypotheses Using Semantic Web Technologies, Socially Intelligent Science, Representing and Reasoning with Experimental and Quasi-Experimental Designs, Bioinformatics Computation of Metabolic Models from Sequenced Genomes, Climate Informatics: Recent Advances and Challenge Problems for Machine Learning in Climate Science, Predictive Modeling of Patient State and Therapy Optimization, Case Studies in Data-Driven Systems: Building Carbon Maps to Finding Neutrinos, Computational Analysis of Complex Human Disorders, and Look at This Gem: Automated Data Prioritization for Scientific Discovery of Exoplanets, Mineral Deposits, and More.
Data-driven systems need to be evaluated to establish trust in the scientific approach and its applicability. In particular, this is true for Knowledge Graph (KG) Question Answering (QA), where complex data structures are made accessible via natural-language interfaces. Evaluating the capabilities of these systems has been a driver for the community for more than ten years while establishing different KGQA benchmark datasets. However, comparing different approaches is cumbersome. The lack of existing and curated leaderboards leads to a missing global view over the research field and could inject mistrust into the results. In particular, the latest and most-used datasets in the KGQA community, LC-QuAD and QALD, miss providing central and up-to-date points of trust. In this paper, we survey and analyze a wide range of evaluation results with significant coverage of 100 publications and 98 systems from the last decade. We provide a new central and open leaderboard for any KGQA benchmark dataset as a focal point for the community - https://kgqa.github.io/leaderboard. Our analysis highlights existing problems during the evaluation of KGQA systems. Thus, we will point to possible improvements for future evaluations.
Each year the International Semantic Web Conference organizes a set of Semantic Web Challenges to establish competitions that will advance state-of-the-art solutions in some problem domains. The Semantic Answer Type and Relation Prediction Task (SMART) task is one of the ISWC 2021 Semantic Web challenges. This is the second year of the challenge after a successful SMART 2020 at ISWC 2020. This year's version focuses on two sub-tasks that are very important to Knowledge Base Question Answering (KBQA): Answer Type Prediction and Relation Prediction. Question type and answer type prediction can play a key role in knowledge base question answering systems providing insights about the expected answer that are helpful to generate correct queries or rank the answer candidates. More concretely, given a question in natural language, the first task is, to predict the answer type using a target ontology (e.g., DBpedia or Wikidata. Similarly, the second task is to identify relations in the natural language query and link them to the relations in a target ontology. This paper discusses the task descriptions, benchmark datasets, and evaluation metrics. For more information, please visit https://smart-task.github.io/2021/.
OWLOOP is an Application Programming Interface (API) for using the Ontology Web Language (OWL) by the means of Object-Oriented Programming (OOP). It is common to design software architectures using the OOP paradigm for increasing their modularity. If the components of an architecture also exploit OWL ontologies for knowledge representation and reasoning, they would require to be interfaced with OWL axioms. Since OWL does not adhere to the OOP paradigm, such an interface often leads to boilerplate code affecting modularity, and OWLOOP is designed to address this issue as well as the associated computational aspects. We present an extension of the OWL-API to provide a general-purpose interface between OWL axioms subject to reasoning and modular OOP objects hierarchies. This manuscript has been submitted to the SoftwareX Elsevier journal on the 12th of January 2021, revised on the 18th of November 2021, accepted on the 14th of December 2021, and published on the 30th of December 2021.
Machine learning methods especially deep neural networks have achieved great success but many of them often rely on a number of labeled samples for training. In real-world applications, we often need to address sample shortage due to e.g., dynamic contexts with emerging prediction targets and costly sample annotation. Therefore, low-resource learning, which aims to learn robust prediction models with no enough resources (especially training samples), is now being widely investigated. Among all the low-resource learning studies, many prefer to utilize some auxiliary information in the form of Knowledge Graph (KG), which is becoming more and more popular for knowledge representation, to reduce the reliance on labeled samples. In this survey, we very comprehensively reviewed over $90$ papers about KG-aware research for two major low-resource learning settings -- zero-shot learning (ZSL) where new classes for prediction have never appeared in training, and few-shot learning (FSL) where new classes for prediction have only a small number of labeled samples that are available. We first introduced the KGs used in ZSL and FSL studies as well as the existing and potential KG construction solutions, and then systematically categorized and summarized KG-aware ZSL and FSL methods, dividing them into different paradigms such as the mapping-based, the data augmentation, the propagation-based and the optimization-based. We next presented different applications, including not only KG augmented tasks in Computer Vision and Natural Language Processing (e.g., image classification, text classification and knowledge extraction), but also tasks for KG curation (e.g., inductive KG completion), and some typical evaluation resources for each task. We eventually discussed some challenges and future directions on aspects such as new learning and reasoning paradigms, and the construction of high quality KGs.
In constraint languages for RDF graphs, such as ShEx and SHACL, constraints on nodes and their properties in RDF graphs are known as "shapes". Schemas in these languages list the various shapes that certain targeted nodes must satisfy for the graph to conform to the schema. Using SHACL, we propose in this paper a novel use of shapes, by which a set of shapes is used to extract a subgraph from an RDF graph, the so-called shape fragment. Our proposed mechanism fits in the framework of Linked Data Fragments. In this paper, (i) we define our extraction mechanism formally, building on recently proposed SHACL formalizations; (ii) we establish correctness properties, which relate shape fragments to notions of provenance for database queries; (iii) we compare shape fragments with SPARQL queries; (iv) we discuss implementation options; and (v) we present initial experiments demonstrating that shape fragments are a feasible new idea.
Wikidata is the largest general-interest knowledge base that is openly available. It is collaboratively edited by thousands of volunteer editors and has thus evolved considerably since its inception in 2012. In this paper, we present Wikidated 1.0, a dataset of Wikidata's full revision history, which encodes changes between Wikidata revisions as sets of deletions and additions of RDF triples. To the best of our knowledge, it constitutes the first large dataset of an evolving knowledge graph, a recently emerging research subject in the Semantic Web community. We introduce the methodology for generating Wikidated 1.0 from dumps of Wikidata, discuss its implementation and limitations, and present statistical characteristics of the dataset.