"An ontology defines the terms used to describe and represent an area of knowledge. … Ontologies include computer-usable definitions of basic concepts in the domain and the relationships among them."
– from OWL Web Ontology Language Use Cases and Requirements. W3C Recommendation (10 February 2004). Jeff Heflin, editor.
In this article, we develop a framework for comparing ontologies and place a number of the more prominent ontologies into it. We have selected 10 specific projects for this study, including general ontologies, domain-specific ones, and one knowledge representation system. The comparison framework includes general characteristics, such as the purpose of an ontology, its coverage (general or domain specific), its size, and the formalism used. It also includes the design process used in creating an ontology and the methods used to evaluate it. Characteristics that describe the content of an ontology include taxonomic organization, types of concept covered, top-level divisions, internal structure of concepts, representation of part-whole relations, and the presence and nature of additional axioms.
By extending Cyc's ontology and KB approximately 2%, Cycorp and Cleveland Clinic Foundation (CCF) have built a system to answer clinical researchers' ad hoc queries. The query may be long and complex, hence only partially understood at first, parsed into a set of CycL (higher-order logic) fragments with open variables. But, surprisingly often, after applying various constraints (medical domain knowledge, common sense, discourse pragmatics, syntax), there is only one single way to fit those fragments together, one semantically meaningful formal query P. The system, SRA (for Semantic Research Assistant), dispatches a series of database calls and then combines, logically and arithmetically, their results into answers to P. Seeing the first few answers stream back, the user may realize that they need to abort, modify, and re-ask their query. Even before they push ASK, just knowing approximately how many answers would be returned can spark such editing. Besides real-time ad hoc query-answering, queries can be bundled and persist over time.
While the amount of data stored in current information systems continuously grows, and the processes making use of such data become more and more complex, extracting knowledge and getting insights from these data, as well as governing both data and the associated processes, are still challenging tasks. The problem is complicated by the proliferation of data sources and services both within a single organization, and in cooperating environments. Effectively accessing, integrating and managing data in complex organizations is still one of the main issues faced by the information technology industry today. Indeed, it is not surprising that data scientists spend a comparatively large amount of time in the data preparation phase of a project, compared with the data minining and knowledge discovery phase. Whether you call it data wrangling, data munging, or data integration, it is estimated that 50%-80% of a data scientists time is spent on collecting and organizing data for analysis.
Machine learning algorithms are now synonymous with finding patterns in data but not all patterns are suitable for statistics based data-driven techniques, for example when these patterns don't have explicitly labelled targets to learn from. In some cases, these patterns can be expressed precisely as a rule. Reasoning is the process of matching rule-based patterns or verifying that they don't exist in a graph. Because these patterns are found with deductive logic they can be found more efficiently and interpreted more easily than Machine Learning patterns which are induced from the data. This article will introduce some common patterns and how you can express them in the rule language, Datalog, using RDFox, a knowledge graph and semantic reasoning engine developed by Oxford Semantic Technologies.
One of the most important goals of digital humanities is to provide researchers with data and tools for new research questions, either by increasing the scale of scholarly studies, linking existing databases, or improving the accessibility of data. Here, the FAIR principles provide a useful framework as these state that data needs to be: Findable, as they are often scattered among various sources; Accessible, since some might be offline or behind paywalls; Interoperable, thus using standard knowledge representation formats and shared vocabularies; and Reusable, through adequate licensing and permissions. Integrating data from diverse humanities domains is not trivial, research questions such as "was economic wealth equally distributed in the 18th century?", or "what are narratives constructed around disruptive media events?") and preparation phases (e.g. data collection, knowledge organisation, cleaning) of scholars need to be taken into account. In this chapter, we describe the ontologies and tools developed and integrated in the Dutch national project CLARIAH to address these issues across datasets from three fundamental domains or "pillars" of the humanities (linguistics, social and economic history, and media studies) that have paradigmatic data representations (textual corpora, structured data, and multimedia). We summarise the lessons learnt from using such ontologies and tools in these domains from a generalisation and reusability perspective.
The big rub on the first generation of graph databases was that although RDF triple stores were great at storing the simple sentence, they had a hard time with the adverbs, adjectives and clarifying phrases of your data story. If I wanted to store'John is a carpenter since 2001' or'John from Alberta Canada is a carpenter liked by 702 people', the syntax of old-school triple stores had a more tedious, but not impossible way of handling it. It involved creating extra nodes that were confusing to some and a process called reification. Until about a year ago, labeled property graphs (LPG) were better at color and detail than RDF, having a more intuitive syntax for clarifying adverbs, adjectives, and phrases. That was, of course, until recently.
Interoperability in the construction sector is a key issue and researchers, developers and designers have tackled since the introduction of CAD systems. Traditionally, engineers, architects and site operators interact and track their information exchange through paper or digitalized drawings and e-mails. With the introduction of Building Information Modelling (BIM) techniques and tools, operators are using new solutions and methods to keep track and exploit these data. Cover image: ifcOWL ontology (version IFC4ADD2) visualized thanks to WebVOWL, available hereWhat has been described as traditional method corresponds to Level 0 in well-known BIM levels definition. The concept of BIM level 1 represents the criteria needed for the full collaboration for the industry.
Open-Access and Collaborative Consumer Health Vocabulary (OAC CHV, or CHV for short), is a collection of medical terms written in plain English. It provides a list of simple, easy, and clear terms that laymen prefer to use rather than an equivalent professional medical term. The National Library of Medicine (NLM) has integrated and mapped the CHV terms to their Unified Medical Language System (UMLS). These CHV terms mapped to 56000 professional concepts on the UMLS. We found that about 48% of these laymen's terms are still jargon and matched with the professional terms on the UMLS. In this paper, we present an enhanced word embedding technique that generates new CHV terms from a consumer-generated text. We downloaded our corpus from a healthcare social media and evaluated our new method based on iterative feedback to word embeddings using ground truth built from the existing CHV terms. Our feedback algorithm outperformed unmodified GLoVe and new CHV terms have been detected.
In this paper, we introduce a novel interpreting framework that learns an interpretable model based on an ontology-based sampling technique to explain agnostic prediction models. Different from existing approaches, our algorithm considers contextual correlation among words, described in domain knowledge ontologies, to generate semantic explanations. To narrow down the search space for explanations, which is a major problem of long and complicated text data, we design a learnable anchor algorithm, to better extract explanations locally. A set of regulations is further introduced, regarding combining learned interpretable representations with anchors to generate comprehensible semantic explanations. An extensive experiment conducted on two real-world datasets shows that our approach generates more precise and insightful explanations compared with baseline approaches.