Ontologies
Semantics-Empowered Big Data Processing with Applications
Thirunarayan, Krishnaprasad (Kno.e.sis: Ohio Center of Excellence in Knowledge-Enabled Computing) | Sheth, Amit (Kno.e.sis: Ohio Center of Excellence in Knowledge-Enabled Computing)
We discuss the nature of big data and address the role of semantics in analyzing and processing big data that arises in the context of physical-cyber-social systems. To handle volume, we advocate semantic perception that can convert low-level observational data to higher-level abstractions more suitable for decision-making. To handle variety, we resort to semantic models and annotations of data so that intelligent processing can be done independent of heterogeneity of data formats and media. To handle velocity, we seek to use continuous semantics capability to dynamically create event or situation specific models and recognize relevant new concepts, entities and facts. To handle veracity, we explore trust models and approaches to glean trustworthiness. These four v's of big data are harnessed by the semantics-empowered analytics to derive value to support applications transcending physical-cyber-social continuum.
Why the Data Train Needs Semantic Rails
Janowicz, Krzysztof (University of California, Santa Barbara) | Harmelen, Frank van (Vrije Universiteit Amsterdam) | Hendler, James A. (Rensselaer Polytechnic Institute) | Hitzler, Pascal (Wright State University)
While catchphrases such as big data, smart data, data-intensive science, or smart dust highlight different aspects, they share a common theme: Namely, a shift towards a data-centric perspective in which the synthesis and analysis of data at an ever-increasing spatial, temporal, and thematic resolution promises new insights, while, at the same time, reducing the need for strong domain theories as starting points. In terms of the envisioned methodologies, those catchphrases tend to emphasize the role of predictive analytics, that is, statistical techniques including data mining and machine learning, as well as supercomputing. Interestingly, however, while this perspective takes the availability of data as a given, it does not answer the question how one would discover the required data in today’s chaotic information universe, how one would understand which datasets can be meaningfully integrated, and how to communicate the results to humans and machines alike. The semantic web addresses these questions. In the following, we argue why the data train needs semantic rails. We point out that making sense of data and gaining new insights works best if inductive and deductive techniques go hand-in-hand instead of competing over the prerogative of interpretation.
A New Look at Ontology Correctness
Aameri, Bahar (University of Toronto) | Gruninger, Michael (University of Toronto)
The design of ontologies for new commonsense domains continues to pose challenges, particularly in cases where multiple potential axiomatizations satisfy the requirements for the ontology. One approach is to specify the requirements with respect to the intended semantics of the terminology; from a mathematical perspective the requirements may be characterized by the class of structures(referred to as the required models) which capturethe intended semantics. This approach leads to a natural notion of the correctness as a relationship between the models of the axiomatization of the ontology and the required models for the ontology. In this paper, we consider three possible generalizations of the notion of the correctness of an ontology in the case in which the ontology and the required models have different signatures.We show that these notions of correctness lead to different approaches for ontology evaluation and discuss the benefits and drawbacks of each approach.
Towards Ontologies in Variation
Hahmann, Torsten (University of Maine) | McIlraith, Sheila A. (University of Toronto)
In this extended abstract we examine the principles that underlie the construction of what we call Ontologies in Variation — a human-comprehensible knowledge representation scheme for natural kinds, objects, and concepts that captures both prototypical (or canonical) properties of classes of objects as well as those properties that are in variation. A fundamental characteristic of our work is that the variability captured in our representation is derived from data and as such that the provenance of statistical knowledge — the dataset — is directly associated with the ontology. This reliance on empirical data directs us towards a frequentist view of variation as statistical assertions, in contrast to much of the current work that integrates logic and uncertainty. Our formalism's novelty lies in the strategic complementation of axiomatic knowledge by statistical knowledge, and by the desire to preserve human comprehension of the resulting representation. We illustrate this work in the context of an ongoing project to create a representation of human anatomy — a queryable digital anatomy book that fits all of us in some variation.
Learning New Relations from Concept Ontologies Derived from Definitions
Orfan, Jansen (University of Rochester) | Allen, James (University of Rochester)
Systems that build general knowledge bases from concept definitions mostly focus on knowledge extraction techniques on a per-definition basis. But, definitions rely on subtext and other definitions to concisely encode a concept's meaning. We present a probabilistic inference process where we systematically augment knowledge extracted from several WordNet glosses with subtext and then infer likely states of the world. From those states we learn new semantic relations among properties, states, and events. We show that our system learns more relations than one without subtext and verify this knowledge using human evaluators.
An Activity-Based Ontology for Dates
Gruninger, Michael (University of Toronto) | Katsumi, Megan (University of Toronto)
The representation of dates and their relationship to time and duration has long been recognized as an important problem in commonsense reasoning. However, existing date ontologies, such as OWL-Time and Date-Time Foundation Vocabulary from the Object Modeling Group, take either over-simplistic or convoluted approaches to defining the key semantics for dates. We show that such approaches are inadequate and provide an improved solution: a first-order Date Ontology that is an extension of the Process Specification Language and an existing duration ontology. Rather than treat dates as a class of timepoints, we axiomatize dates as a class of complex activities which have multiple periodic occurrences. We consider two modules of the Date Ontology, and characterize the models of the Date Ontology up to elementary equivalence.
Leveraging Ontologies to Improve Model Generalization Automatically with Online Data Sources
Janpuangtong, Sasin (Texas A&M University) | Shell, Dylan A. (Texas A&M University)
This paper describes an end-to-end learning framework that allows a novice to create a model from data easily by helping structure the model building process and capturing extended aspects of domain knowledge. By treating the whole modeling process interactively and exploiting high-level knowledge in the form of an ontology, the framework is able to aid the user in a number of ways, including in helping to avoid pitfalls such as data dredging. Prudence must be exercised to avoid these hazards: certain conclusions may be supported by extra knowledge if, for example, there are reasons to trust a particular narrower set of hypotheses. This paper adopts the solution of using higher-level knowledge in order to allow this sort of domain knowledge to be inferred automatically, thereby selecting only relevant input attributes and thence constraining the hypothesis space. We describe how the framework automatically exploits structured knowledge in an ontology to identify relevant concepts, and how a data extraction component can make use of online data sources to find measurements of those concepts so that their relevance can be evaluated. To validate our approach, models of four different problem domains were built using our implementation of the framework. Prediction error on unseen examples of these models show that our framework, making use of the ontology, helps to improve model generalization.
From Classical to Consistent Query Answering under Existential Rules
Lukasiewicz, Thomas (University of Oxford) | Martinez, Maria Vanina (Universidad Nacional del Sur and Consejo Nacional de Investigaciones Científicas y Técnicas CONICET) | Pieris, Andreas (Vienna University of Technology) | Simari, Gerardo I (Universidad Nacional del Sur and Consejo Nacional de Investigaciones Científicas y Técnicas CONICET)
Querying inconsistent ontologies is an intriguing new problem that gave rise to a flourishing research activity in the description logic (DL) community. The computational complexity of consistent query answering under the main DLs is rather well understood; however, little is known about existential rules. The goal of the current work is to perform an in-depth analysis of the complexity of consistent query answering under the main decidable classes of existential rules enriched with negative constraints. Our investigation focuses on one of the most prominent inconsistency-tolerant semantics, namely, the AR semantics. We establish a generic complexity result, which demonstrates the tight connection between classical and consistent query answering. This result allows us to obtain in a uniform way a relatively complete picture of the complexity of our problem.
FACES: Diversity-Aware Entity Summarization Using Incremental Hierarchical Conceptual Clustering
Gunaratna, Kalpa (Kno.e.sis, Wright State University) | Thirunarayan, Krishnaparasad (Kno.e.sis, Wright State University) | Sheth, Amit (Kno.e.sis, Wright State University)
Semantic Web documents that encode facts about entities on the Web have been growing rapidly in size and evolving over time. Creating summaries on lengthy Semantic Web documents for quick identification of the corresponding entity has been of great contemporary interest. In this paper, we explore automatic summarization techniques that characterize and enable identification of an entity and create summaries that are human friendly. Specifically, we highlight the importance of diversified (faceted) summaries by combining three dimensions: diversity, uniqueness, and popularity. Our novel diversity-aware entity summarization approach mimics human conceptual clustering techniques to group facts and picks representative facts from each group to form concise (i.e., short) and comprehensive (i.e., improved coverage through diversity) summaries. We evaluate our approach against the state-of-the-art techniques and show that our work improves both the quality and the efficiency of entity summarization.