Goto

Collaborating Authors

 Ontologies


The Limits of Efficiency for Open- and Closed-World Query Evaluation Under Guarded TGDs

arXiv.org Artificial Intelligence

Ontology-mediated querying and querying in the presence of constraints are two key database problems where tuple-generating dependencies (TGDs) play a central role. In ontology-mediated querying, TGDs can formalize the ontology and thus derive additional facts from the given data, while in querying in the presence of constraints, they restrict the set of admissible databases. In this work, we study the limits of efficient query evaluation in the context of the above two problems, focussing on guarded and frontier-guarded TGDs and on UCQs as the actual queries. We show that a class of ontology-mediated queries (OMQs) based on guarded TGDs can be evaluated in FPT iff the OMQs in the class are equivalent to OMQs in which the actual query has bounded treewidth, up to some reasonable assumptions. For querying in the presence of constraints, we consider classes of constraint-query specifications (CQSs) that bundle a set of constraints with an actual query. We show a dichotomy result for CQSs based on guarded TGDs that parallels the one for OMQs except that, additionally, FPT coincides with PTime combined complexity. The proof is based on a novel connection between OMQ and CQS evaluation. Using a direct proof, we also show a similar dichotomy result, again up to some reasonable assumptions, for CQSs based on frontier-guarded TGDs with a bounded number of atoms in TGD heads. Our results on CQSs can be viewed as extensions of Grohe's well-known characterization of the tractable classes of CQs (without constraints). Like Grohe's characterization, all the above results assume that the arity of relation symbols is bounded by a constant. We also study the associated meta problems, i.e., whether a given OMQ or CQS is equivalent to one in which the actual query has bounded treewidth.


Use Case Driven Object Modeling with UML - Programmer Books

#artificialintelligence

Diagramming and process are important topics in today's software development world, as the UML diagramming language has come to be almost universally accepted. Yet process is necessary; by themselves, diagrams are of little use. Use Case Driven Object Modeling with UML โ€“ Theory and Practice combines the notation of UML with a lightweight but effective process โ€“ the ICONIX process โ€“ for designing and developing software systems. ICONIX has developed a growing following over the years. Sitting between the free-for-all of Extreme Programming and overly rigid processes such as RUP, ICONIX offers just enough structure to be successful.


Semantic integration of disease-specific knowledge

arXiv.org Artificial Intelligence

Motivation: Biomedical researchers working on a specific disease need up-to-date and unified access to knowledge relevant to the disease of their interest. Knowledge is continuously accumulated in scientific literature and other resources such as biomedical ontologies. Identifying the specific information needed is a challenging task and computational tools can be valuable. In this study, we propose a pipeline to automatically retrieve and integrate relevant knowledge based on a semantic graph representation, the iASiS Open Data Graph . Results: The disease-specific semantic graph can provide easy access to resources relevant to specific concepts and individual aspects of these concepts, in the form of concept relations and attributes. The proposed approach is applied to three different case studies: T wo prevalent diseases, Lung Cancer and Dementia, for which a lot of knowledge is available, and one rare disease, Duchenne Muscular Dystrophy, for which knowledge is less abundant and difficult to locate. Results from exemplary queries are presented, investigating the potential of this approach in integrating and accessing knowledge as an automatically generated semantic graph.


Design and Implementation of Linked Planning Domain Definition Language

arXiv.org Artificial Intelligence

Planning is a critical component of any artificial intelligence system that concerns the realization of strategies or action sequences typically for intelligent agents and autonomous robots. Given predefined parameterized actions, a planning service should accept a query with the goal and initial state to give a solution with a sequence of actions applied to environmental objects. This paper addresses the problem by providing a repository of actions generically applicable to various environmental objects based on Semantic Web technologies. Ontologies are used for asserting constraints in common sense as well as for resolving compatibilities between actions and states. Constraints are defined using Web standards such as SPARQL and SHACL to allow conditional predicates. We demonstrate the usefulness of the proposed planning domain description language with our robotics applications.


Polynomial Rewritings from Expressive Description Logics with Closed Predicates to Variants of Datalog

arXiv.org Artificial Intelligence

In many scenarios, complete and incomplete information coexist. For this reason, the knowledge representation and database communities have long shown interest in simultaneously supporting the closed- and the open-world views when reasoning about logic theories. Here we consider the setting of querying possibly incomplete data using logic theories, formalized as the evaluation of an ontology-mediated query (OMQ) that pairs a query with a theory, sometimes called an ontology, expressing background knowledge. This can be further enriched by specifying a set of closed predicates from the theory that are to be interpreted under the closed-world assumption, while the rest are interpreted with the open-world view. In this way we can retrieve more precise answers to queries by leveraging the partial completeness of the data. The central goal of this paper is to understand the relative expressiveness of OMQ languages in which the ontology is written in the expressive Description Logic (DL) ALCHOI and includes a set of closed predicates. We consider a restricted class of conjunctive queries. Our main result is to show that every query in this non-monotonic query language can be translated in polynomial time into Datalog with negation under the stable model semantics. To overcome the challenge that Datalog has no direct means to express the existential quantification present in ALCHOI, we define a two-player game that characterizes the satisfaction of the ontology, and design a Datalog query that can decide the existence of a winning strategy for the game. If there are no closed predicates, that is in the case of querying a plain ALCHOI knowledge base, our translation yields a positive disjunctive Datalog program of polynomial size. To the best of our knowledge, unlike previous translations for related fragments with expressive (non-Horn) DLs, these are the first polynomial time translations.


Training without training data: Improving the generalizability of automated medical abbreviation disambiguation

arXiv.org Machine Learning

Proceedings of Machine Learning Research XX:1-12, 2019 Machine Learning for Health (ML4H) at NeurIPS 2019 1 Training without training data: Improving the generalizability of automated medical abbreviation disambiguation* Marta Skreta 1,2 martaskreta@cs.toronto.edu Michael Brudno 1,2 brudno@cs.toronto.edu 1 University of Toronto, Department of Computer Science 2 The Hospital for Sick Children, Center for Computational Medicine 3 Vector Institute for Artifical Intelligence, Toronto, Canada Abstract Abbreviation disambiguation is important for automated clinical note processing due to the frequent use of abbreviations in clinical settings. Current models for automated abbreviation disambiguation are restricted by the scarcity and imbalance of labeled training data, decreasing their generalizability to orthogonal sources. In this work we propose a novel data augmentation technique that utilizes information from related medical concepts, which improves our model's ability to generalize. Furthermore, we show that incorporating the global context information within the whole medical note (in addition to the traditional local context window), can significantly improve the model's representation for abbreviations. We train our model on a public dataset (MIMIC III) and test its performance on datasets from different sources (CASI, i2b2). Together, these two techniques boost the accuracy of abbreviation disambiguation by almost 14% on the CASI dataset and 4% on i2b2. 1. Introduction Health care practitioners typically use abbreviations when preparing clinical records, saving time and space with the cost of increased ambiguity.


OpenBioLink: A resource and benchmarking framework for large-scale biomedical link prediction

arXiv.org Artificial Intelligence

Summary: Recently, novel machine-learning algorithms have shown potential for predicting undiscovered links in biomedical knowledge networks. However, dedicated benchmarks for measuring algorithmic progress have not yet emerged. With OpenBioLink, we introduce a large-scale, high-quality and highly challenging biomedical link prediction benchmark to transparently and reproducibly evaluate such algorithms.


Direct Mappings between RDF and Property Graph Databases

arXiv.org Artificial Intelligence

RDF [21] and Graph databases [27] are two approaches for data management that are based on modeling, storing and querying graph-like data. The database systems based on these models are gaining relevance in the industry due to their use in various application domains where complex data analytics is required [2]. RDF triplestores and graph database systems are tightly connected as they are based on graph data models. RDF databases are based on the RDF data model [21], their standard query language is SPARQL [15], and RDF Schema [8] allows to describe classes of resources and properties (i.e. the data schema). On the other hand, most graph databases are based on the Property Graph (PG) data model, there is no standard query language, and there is no standard notion of property graph schema [25]. Therefore, RDF and PG database systems are dissimilar in data model, schema constraints and query language.


Ontologies for the Virtual Materials Marketplace

arXiv.org Artificial Intelligence

The Virtual Materials Marketplace (VIMMP) project, which develops an open platform for providing and accessing services related to materials modelling, is presented with a focus on its ontology development and data technology aspects. Within VIMMP, a system of marketplace-level ontologies is developed to characterize services, models, and interactions between users; the European Materials and Modelling Ontology (EMMO), which is based on mereotopology following Varzi and semiotics following Peirce, is employed as a top-level ontology. The ontologies are used to annotate data that are stored in the ZONTAL Space component of VIMMP and to support the ingest and retrieval of data and metadata at the VIMMP marketplace frontend.


An Introduction to Artificial Intelligence Applied to Multimedia

#artificialintelligence

In this chapter, we give an introduction to symbolic artificial intelligence (AI) and discuss its relation and application to multimedia. We begin by defining what symbolic AI is, what distinguishes it from non-symbolic approaches, such as machine learning, and how it can used in the construction of advanced multimedia applications. We then introduce description logic (DL) and use it to discuss symbolic representation and reasoning. DL is the logical underpinning of OWL, the most successful family of ontology languages. After discussing DL, we present OWL and related Semantic Web technologies, such as RDF and SPARQL.