Ontologies
Beyond OWL 2 QL in OBDA: Rewritings and Approximations
Botoeva, Elena (Free University of Bozen-Bolzano) | Calvanese, Diego (Free University of Bozen-Bolzano) | Santarelli, Valerio (Sapienza Università di Roma) | Savo, Domenico Fabio (Sapienza Università di Roma) | Solimando, Alessandro (University of Genova) | Xiao, Guohui (Free University of Bozen-Bolzano)
Ontology-based data access (OBDA) is a novel paradigm facilitating access to relational data, realized by linking data sources to an ontology by means of declarative mappings. DL-Lite_R, which is the logic underpinning the W3C ontology language OWL 2 QL and the current language of choice for OBDA, has been designed with the goal of delegating query answering to the underlying database engine, and thus is restricted in expressive power. E.g., it does not allow one to express disjunctive information, and any form of recursion on the data. The aim of this paper is to overcome these limitations of DL-Lite_R, and extend OBDA to more expressive ontology languages, while still leveraging the underlying relational technology for query answering. We achieve this by relying on two well-known mechanisms, namely conservative rewriting and approximation, but significantly extend their practical impact by bringing into the picture the mapping, an essential component of OBDA. Specifically, we develop techniques to rewrite OBDA specifications with an expressive ontology to "equivalent" ones with a DL-Lite_R ontology, if possible, and to approximate them otherwise. We do so by exploiting the high expressive power of the mapping layer to capture part of the domain semantics of rich ontology languages. We have implemented our techniques in the prototype system OntoProx, making use of the state-of-the-art OBDA system Ontop and the query answering system Clipper, and we have shown their feasibility and effectiveness with experiments on synthetic and real-world data.
Commonsense in Parts: Mining Part-Whole Relations from the Web and Image Tags
Tandon, Niket (Max Planck Institute for Informatics ) | Hariman, Charles (Max Planck Institute for Informatics) | Urbani, Jacopo (Max Planck Institute for Informatics and VU University Amsterdam) | Rohrbach, Anna (Max Planck Institute for Informatics) | Rohrbach, Marcus (University of California, Berkeley) | Weikum, Gerhard (Max Planck Institute for Informatics)
Commonsense knowledge about part-whole relations (e.g., screen partOf notebook) is important for interpreting user input in web search and question answering, or for object detection in images. Prior work on knowledge base construction has compiled part-whole assertions, but with substantial limitations: i) semantically different kinds of part-whole relations are conflated into a single generic relation, ii) the arguments of a part-whole assertion are merely words with ambiguous meaning, iii) the assertions lack additional attributes like visibility (e.g., a nose is visible but a kidney is not) and cardinality information (e.g., a bird has two legs while a spider eight), iv) limited coverage of only tens of thousands of assertions. This paper presents a new method for automatically acquiring part-whole commonsense from Web contents and image tags at an unprecedented scale, yielding many millions of assertions, while specifically addressing the four shortcomings of prior work. Our method combines pattern-based information extraction methods with logical reasoning. We carefully distinguish different relations: physicalPartOf, memberOf, substanceOf. We consistently map the arguments of all assertions onto WordNet senses, eliminating the ambiguity of word-level assertions. We identify whether the parts can be visually perceived, and infer cardinalities for the assertions. The resulting commonsense knowledge base has very high quality and high coverage, with an accuracy of 89% determined by extensive sampling, and is publicly available.
UCO: A Unified Cybersecurity Ontology
Syed, Zareen (University of Maryland Baltimore County) | Padia, Ankur (University of Maryland, Baltimore County) | Finin, Tim (University of Maryland, Baltimore County) | Mathews, Lisa (University of Maryland, Baltimore County) | Joshi, Anupam (University of Maryland, Baltimore County)
In this paper we describe the Unified Cybersecurity Ontology (UCO) that is intended to support information integration and cyber situational awareness in cybersecurity systems. The ontology incorporates and integratesheterogeneous data and knowledge schemas from different cybersecurity systems and most commonly usedcybersecurity standards for information sharing and exchange. The UCO ontology has also been mapped to anumber of existing cybersecurity ontologies as well asconcepts in the Linked Open Data cloud (Berners-Lee,Bizer, and Heath 2009). Similar to DBpedia (Auer etal. 2007) which serves as the core for general knowledge in Linked Open Data cloud, we envision UCO toserve as the core for cybersecurity domain, which wouldevolve and grow with the passage of time with additional cybersecurity data sets as they become available.We also present a prototype system and concrete usecases supported by the UCO ontology. To the best of ourknowledge, this is the first cybersecurity ontology thathas been mapped to general world ontologies to support broader and diverse security use cases. We comparethe resulting ontology with previous efforts, discuss itsstrengths and limitations, and describe potential futurework directions.
Automatically Augmenting Titles of Research Papers for Better Discovery
Pallan, Madhavan (IBM Research - India) | Srivastava, Biplav (IBM Research - India)
It is well known that the title of an article impacts how well it is discovered by potential readers and read. With both people and search engines, acting on behalf of people, accessing papers from digital libraries, it is important that the paper titles should promote discovery. In this paper, we investigate the characteristics of titles of AI papers and then propose au- tomatic ways to augment them so that they can be better in- dexed and discovered by users. A user study with researchers shows that they overwhelmingly prefer the augmented titles over the originals for being more helpful.
An Intelligent Dialogue Agent for the IoT Home
Jeon, Heesik (Samsung Electronics) | Oh, Hyung Rai (Samsung Electronics) | Hwang, Inchul (Samsung Electronics) | Kim, Jihie (Samsung Electronics)
In this paper, we propose an intelligent dialogue agent for the IoT home. The goal of the proposed system is to efficiently control IoT devices with natural spoken dialogue. This system is made up of the following components: Spoken Language Understanding for analyzing textual input and understanding user intention, Dialogue Management with a State Manager that consists of dialogue policies, Context Manager for understanding the environment, Action Planner responsible for generating a sequence of actions to achieve user intention, Things Manager for observing and controlling IoT devices, and Natural Language Generation that generates natural language from computer-based representation. This system is fully implemented in software and is evaluated in a real IoT home environment.
Combining Two and Three-Way Embedding Models for Link Prediction in Knowledge Bases
Garcia-Duran, Alberto, Bordes, Antoine, Usunier, Nicolas, Grandvalet, Yves
This paper tackles the problem of endogenous link prediction for knowledge base completion. Knowledge bases can be represented as directed graphs whose nodes correspond to entities and edges to relationships. Previous attempts either consist of powerful systems with high capacity to model complex connectivity patterns, which unfortunately usually end up overfitting on rare relationships, or in approaches that trade capacity for simplicity in order to fairly model all relationships, frequent or not. In this paper, we propose Tatec, a happy medium obtained by complementing a high-capacity model with a simpler one, both pre-trained separately and then combined. We present several variants of this model with different kinds of regularization and combination strategies and show that this approach outperforms existing methods on different types of relationships by achieving state-of-the-art results on four benchmarks of the literature.
An AI with 30 Years' Worth of Knowledge Finally Goes to Work
Having spent the past 31 years memorizing an astonishing collection of general knowledge, the artificial-intelligence engine created by Doug Lenat is finally ready to go to work. Lenat's creation is Cyc, a knowledge base of semantic information designed to give computers some understanding of how things work in the real world. Cyc has been given many thousands of facts, including lots of information that you wouldn't find in an encyclopedia because it seems self-evident. It knows, for example, that that Sir Isaac Newton is a famous historical figure who is no longer alive. But more important, Cyc also understands that if you let go of an apple it will fall to the ground; that an apple is not bigger than a person; and that a person cannot throw an apple into space.
New metrics for learning and inference on sets, ontologies, and functions
Yang, Ruiyu, Jiang, Yuxiang, Hahn, Matthew W., Housworth, Elizabeth A., Radivojac, Predrag
We propose new metrics on sets, ontologies, and functions that can be used in various stages of probabilistic modeling, including exploratory data analysis, learning, inference, and result interpretation. These new functions unify and generalize some of the popular metrics on sets and functions, such as the Jaccard and bag distances on sets and Marczewski-Steinhaus distance on functions. We then introduce information-theoretic metrics on directed acyclic graphs drawn independently according to a fixed probability distribution and show how they can be used to calculate similarity between class labels for the objects with hierarchical output spaces (e.g., protein function). Finally, we provide evidence that the proposed metrics are useful by clustering species based solely on functional annotations available for subsets of their genes. The functional trees resemble evolutionary trees obtained by the phylogenetic analysis of their genomes.
A Lightweight Methodology for Rapid Ontology Engineering
We are living in a reality that, thanks to economic globalization and the Internet, is increasingly interconnected and complex. There is thus a growing need for semantic technology solutions that can help us better understand it, particularly from a conceptual point of view. Ontologies represent an essential component to developing the Web of Data (such as Linked Open Data1) and Semantic Web applications. An ontology is a conceptual model of (a fragment of) an observed reality; it is, in essence, a repository of interlinked concepts pertaining to a given application domain. Traditionally, construction of an ontology (and its constant evolution, necessary to keep it aligned with reality) is lengthy and costly.
Formal Ontology Learning on Factual IS-A Corpus in English using Description Logics
Dasgupta, Sourish, Padia, Ankur, Shah, Kushal, Majumder, Prasenjit
Ontology Learning (OL) is the computational task of generating a knowledge base in the form of an ontology given an unstructured corpus whose content is in natural language (NL). Several works can be found in this area most of which are limited to statistical and lexico-syntactic pattern matching based techniques Light-Weight OL. These techniques do not lead to very accurate learning mostly because of several linguistic nuances in NL. Formal OL is an alternative (less explored) methodology were deep linguistics analysis is made using theory and tools found in computational linguistics to generate formal axioms and definitions instead simply inducing a taxonomy. In this paper we propose "Description Logic (DL)" based formal OL framework for learning factual IS-A type sentences in English. We claim that semantic construction of IS-A sentences is non trivial. Hence, we also claim that such sentences requires special studies in the context of OL before any truly formal OL can be proposed. We introduce a learner tool, called DLOL_IS-A, that generated such ontologies in the owl format. We have adopted "Gold Standard" based OL evaluation on IS-A rich WCL v.1.1 dataset and our own Community representative IS-A dataset. We observed significant improvement of DLOL_IS-A when compared to the light-weight OL tool Text2Onto and formal OL tool FRED.