"An ontology defines the terms used to describe and represent an area of knowledge. … Ontologies include computer-usable definitions of basic concepts in the domain and the relationships among them."
– from OWL Web Ontology Language Use Cases and Requirements. W3C Recommendation (10 February 2004). Jeff Heflin, editor.
When I started learning about the semantic web, it was quite foreign territory and the practitioners all seemed to be talking over my head, so when I began to figure it out, I thought it would be valuable to write an introduction for those interested but a little put off. Well it's a whole bunch of things stitched together with many tools and different technologies and standards. Let's start with the problem that the semantic web is trying to solve. Microsoft explained it very well with its Bing commercials on search overload. Not that Bing solves it, but at least Microsoft is good at explaining the problem.
The intersection of the COVID-19 pandemic and analytics has been in focus almost since the pandemic began. Organizations like Johns Hopkins Center for Systems Science and Engineering (CSSE), the New York Times and many governments, including states and municipalities in the US, have been publishing data around a number of indicators, including case counts, hospitalizations, deaths and rates of positive testing. The data sets are downloadable in open formats, and available for self-service analysis. But with so many datasets, new circumstances like in-progress re-openings and new spikes in infection, what's the best way really to make sense of the data? And what other data, not specific to Coronoavirus/COVID-19, might be useful and germane?
The screen shows four types of COVID-19 related entities, virus (blue), cell (pink), gene or genome (green), and disease or syndrome (red), and their relationships. All entities are Unified Medical Language System (UMLS) compatible for convenient knowledge sharing. The systems support 75 types of UMLS entities. Researchers from Florida Atlantic University's College of Engineering and Computer Science, in collaboration with FAU's Schmidt College of Medicine, have received a one-year, $90,000 National Science Foundation (NSF) RAPID project grant to conduct research using social networks and machine learning, facilitated by molecular genetics and viral infection, for COVID-19 modeling and risk evaluation. The project will create a web-based COVID-19 knowledge base, as well as a risk evaluation tool for individuals to assess their infection risk in a dynamic environment.
Research on semantic web services promises greater interoperability among software agents and web services by enabling content-based automated service discovery and interaction and by utilizing . Although this is to be based on use of shared ontologies published on the semantic web, services produced and described by different developers may well use different, perhaps partly overlapping, sets of ontologies. Interoperability will depend on ontology mappings and architectures supporting the associated translation processes. The question we ask is, does the traditional approach of introducing mediator agents to translate messages between requestors and services work in such an open environment? This article reviews some of the processing assumptions that were made in the development of the semantic web service modeling ontology OWL-S and argues that, as a practical matter, the translation function cannot always be isolated in mediators.
The infrastructure and tools necessary for large-scale data analytics, formerly the exclusive purview of experts, are increasingly available. Whereas a knowledgeable data-miner or domain expert can rightly be expected to exercise caution when required (for example, around fallacious conclusions supposedly supported by the data), the nonexpert may benefit from some judicious assistance. This article describes an end-to-end learning framework that allows a novice to create models from data easily by helping structure the model building process and capturing extended aspects of domain knowledge. By treating the whole modeling process interactively and exploiting high-level knowledge in the form of an ontology, the framework is able to aid the user in a number of ways, including in helping to avoid pitfalls such as data dredging. Prudence must be exercised to avoid these hazards as certain conclusions may only be supported if, for example, there is extra knowledge which gives reason to trust a narrower set of hypotheses.
In this article, we develop a framework for comparing ontologies and place a number of the more prominent ontologies into it. We have selected 10 specific projects for this study, including general ontologies, domain-specific ones, and one knowledge representation system. The comparison framework includes general characteristics, such as the purpose of an ontology, its coverage (general or domain specific), its size, and the formalism used. It also includes the design process used in creating an ontology and the methods used to evaluate it. Characteristics that describe the content of an ontology include taxonomic organization, types of concept covered, top-level divisions, internal structure of concepts, representation of part-whole relations, and the presence and nature of additional axioms.
By extending Cyc's ontology and KB approximately 2%, Cycorp and Cleveland Clinic Foundation (CCF) have built a system to answer clinical researchers' ad hoc queries. The query may be long and complex, hence only partially understood at first, parsed into a set of CycL (higher-order logic) fragments with open variables. But, surprisingly often, after applying various constraints (medical domain knowledge, common sense, discourse pragmatics, syntax), there is only one single way to fit those fragments together, one semantically meaningful formal query P. The system, SRA (for Semantic Research Assistant), dispatches a series of database calls and then combines, logically and arithmetically, their results into answers to P. Seeing the first few answers stream back, the user may realize that they need to abort, modify, and re-ask their query. Even before they push ASK, just knowing approximately how many answers would be returned can spark such editing. Besides real-time ad hoc query-answering, queries can be bundled and persist over time.
While the amount of data stored in current information systems continuously grows, and the processes making use of such data become more and more complex, extracting knowledge and getting insights from these data, as well as governing both data and the associated processes, are still challenging tasks. The problem is complicated by the proliferation of data sources and services both within a single organization, and in cooperating environments. Effectively accessing, integrating and managing data in complex organizations is still one of the main issues faced by the information technology industry today. Indeed, it is not surprising that data scientists spend a comparatively large amount of time in the data preparation phase of a project, compared with the data minining and knowledge discovery phase. Whether you call it data wrangling, data munging, or data integration, it is estimated that 50%-80% of a data scientists time is spent on collecting and organizing data for analysis.
Machine learning algorithms are now synonymous with finding patterns in data but not all patterns are suitable for statistics based data-driven techniques, for example when these patterns don't have explicitly labelled targets to learn from. In some cases, these patterns can be expressed precisely as a rule. Reasoning is the process of matching rule-based patterns or verifying that they don't exist in a graph. Because these patterns are found with deductive logic they can be found more efficiently and interpreted more easily than Machine Learning patterns which are induced from the data. This article will introduce some common patterns and how you can express them in the rule language, Datalog, using RDFox, a knowledge graph and semantic reasoning engine developed by Oxford Semantic Technologies.