"An ontology defines the terms used to describe and represent an area of knowledge. … Ontologies include computer-usable definitions of basic concepts in the domain and the relationships among them."
– from OWL Web Ontology Language Use Cases and Requirements. W3C Recommendation (10 February 2004). Jeff Heflin, editor.
In this article, we develop a framework for comparing ontologies and place a number of the more prominent ontologies into it. We have selected 10 specific projects for this study, including general ontologies, domain-specific ones, and one knowledge representation system. The comparison framework includes general characteristics, such as the purpose of an ontology, its coverage (general or domain specific), its size, and the formalism used. It also includes the design process used in creating an ontology and the methods used to evaluate it. Characteristics that describe the content of an ontology include taxonomic organization, types of concept covered, top-level divisions, internal structure of concepts, representation of part-whole relations, and the presence and nature of additional axioms.
Research on semantic web services promises greater interoperability among software agents and web services by enabling content-based automated service discovery and interaction and by utilizing . Although this is to be based on use of shared ontologies published on the semantic web, services produced and described by different developers may well use different, perhaps partly overlapping, sets of ontologies. Interoperability will depend on ontology mappings and architectures supporting the associated translation processes. The question we ask is, does the traditional approach of introducing mediator agents to translate messages between requestors and services work in such an open environment? This article reviews some of the processing assumptions that were made in the development of the semantic web service modeling ontology OWL-S and argues that, as a practical matter, the translation function cannot always be isolated in mediators.
By extending Cyc's ontology and KB approximately 2%, Cycorp and Cleveland Clinic Foundation (CCF) have built a system to answer clinical researchers' ad hoc queries. The query may be long and complex, hence only partially understood at first, parsed into a set of CycL (higher-order logic) fragments with open variables. But, surprisingly often, after applying various constraints (medical domain knowledge, common sense, discourse pragmatics, syntax), there is only one single way to fit those fragments together, one semantically meaningful formal query P. The system, SRA (for Semantic Research Assistant), dispatches a series of database calls and then combines, logically and arithmetically, their results into answers to P. Seeing the first few answers stream back, the user may realize that they need to abort, modify, and re-ask their query. Even before they push ASK, just knowing approximately how many answers would be returned can spark such editing. Besides real-time ad hoc query-answering, queries can be bundled and persist over time. One bundle of 275 queries is rerun quarterly by CCF to produce the procedures and outcomes data it needs to report to STS (Society of Thoracic Surgeons, an external hospital accreditation and ranking body); another bundle covers ACC (American College of Cardiology) reporting.
The infrastructure and tools necessary for large-scale data analytics, formerly the exclusive purview of experts, are increasingly available. Whereas a knowledgeable data-miner or domain expert can rightly be expected to exercise caution when required (for example, around fallacious conclusions supposedly supported by the data), the nonexpert may benefit from some judicious assistance. This article describes an end-to-end learning framework that allows a novice to create models from data easily by helping structure the model building process and capturing extended aspects of domain knowledge. By treating the whole modeling process interactively and exploiting high-level knowledge in the form of an ontology, the framework is able to aid the user in a number of ways, including in helping to avoid pitfalls such as data dredging. Prudence must be exercised to avoid these hazards as certain conclusions may only be supported if, for example, there is extra knowledge which gives reason to trust a narrower set of hypotheses.
While the amount of data stored in current information systems continuously grows, and the processes making use of such data become more and more complex, extracting knowledge and getting insights from these data, as well as governing both data and the associated processes, are still challenging tasks. The problem is complicated by the proliferation of data sources and services both within a single organization, and in cooperating environments. Effectively accessing, integrating and managing data in complex organizations is still one of the main issues faced by the information technology industry today. Indeed, it is not surprising that data scientists spend a comparatively large amount of time in the data preparation phase of a project, compared with the data minining and knowledge discovery phase. Whether you call it data wrangling, data munging, or data integration, it is estimated that 50%-80% of a data scientists time is spent on collecting and organizing data for analysis.
The rapid advancement of Artificial intelligence and its branches like machine learning, deep learning, which function on extracting relevant information and generating insights from data to find sustainable and decisive solutions, is nothing new. But to run these algorithms, organizations need data and code. To translate this necessity into something meaningful, we need data science. While this discipline proliferates into an exciting and diverse technology that incorporates a mixture of deep specialization and broad applications, we also realize the value it brings to the table. Further, data science helps organizations communicate with stakeholders, customers, track and analyze trends, and determine if the collected data is actually of any help or simply a waste of a database farm.
This book comprehensively presents a novel approach to the systematic security hardening of software design models expressed in the standard UML language. It combines model-driven engineering and the aspect-oriented paradigm to integrate security practices into the early phases of the software development process. To this end, a UML profile has been developed for the specification of security hardening aspects on UML diagrams. In addition, a weaving framework, with the underlying theoretical foundations, has been designed for the systematic injection of security aspects into UML models. The work is organized as follows: chapter 1 presents an introduction to software security, model-driven engineering, UML and aspect-oriented technologies.
Biomedical event extraction is critical in understanding biomolecular interactions described in scientific corpus. One of the main challenges is to identify nested structured events that are associated with non-indicative trigger words. We propose to incorporate domain knowledge from Unified Medical Language System (UMLS) to a pre-trained language model via Graph Edge-conditioned Attention Networks (GEANet) and hierarchical graph representation. To better recognize the trigger words, each sentence is first grounded to a sentence graph based on a jointly modeled hierarchical knowledge graph from UMLS. The grounded graphs are then propagated by GEANet, a novel graph neural networks for enhanced capabilities in inferring complex events. On BioNLP 2011 GENIA Event Extraction task, our approach achieved 1.41% F1 and 3.19% F1 improvements on all events and complex events, respectively. Ablation studies confirm the importance of GEANet and hierarchical KG.
Entity-based semantic search has been widely adopted in modern search engines to improve search accuracy by understanding users' intent. In e-commerce, an accurate and complete product type (PT) ontology is essential for recognizing product entities in queries and retrieving relevant products from catalog. However, finding product types (PTs) to construct such an ontology is usually expensive due to the considerable amount of human efforts it may involve. In this work, we propose an active learning framework that efficiently utilizes domain experts' knowledge for PT discovery. We also show the quality and coverage of the resulting PTs in the experiment results.