Ontology and Data Science


If you are new to the word ontology don't worry, I'm going to give a primer on what it is, and then why it matters for the data world. I'll be explicit in the difference between philosophical ontology and the ontology related to information and data in computer science. In simple words, one can say that ontology is the study of what there is. But there is another part to that definition that will help us in the following sections, and that is ontology is usually also taken to encompass problems about the most general features and relations of the entities which do exist. Ontology open new doors for what there is too.

In Between Years. The Year of the Graph Newsletter: January 2019


In between years, or zwischen den Jahren, is a German expression for the period between Christmas and New Year. This is traditionally a time of year when not much happens, and this playful expression lingers itself in between the literal and the metaphoric. As the first edition of the Year of the Graph newsletter is here, a short retrospective may be due in addition to the usual updates. When we called 2018 the Year of the Graph, we did not have to wait for the Gartners of the world to verify what we saw coming. We can without a doubt say this has been the Year Graphs went mainstream.

Can machines have common sense? – Moral Robots – Medium


The Cyc project (initially planned from 1984 to 1994) is the world's longest-lived AI project. The idea was to create a machine with "common sense," and it was predicted that about 10 years should suffice to see significant results. That didn't quite work out, and today, after 35 years, the project is still going on -- although by now very few experts still believe in the promises made by Cyc's developers. Common sense is more than just explaining the meaning of words. For example, we have already seen how "sibling" or "daughter" can be explained in Prolog with a dictionary-like definition.

The Future Is Now, So Embrace AI


"When you look at today's technology and how rapidly it's progressing across multiple industries, you wonder what the future holds for all of us," he continues. "Most products and services today are now part of a complex business ecosystem." As chair and professor of information systems at ASU's W.P. Carey School of Business, Santanam is known for his expertise on the impacts of technology on businesses, society and consumers. His research interests include mobile platforms, e-commerce, health information technology and cloud computing. Santanam is ready to help current and future business leaders get up to speed on these concepts in his one-day workshop, "The Future of Work and Digital Innovation."

More Effective Ontology Authoring with Test-Driven Development

arXiv.org Artificial Intelligence

Faculty of Computing, Poznan University of Technology, Poland, agnieszka.lawrynowicz@cs.put.poznan.pl Abstract Ontology authoring is a complex process, where commonly the automated reasoner is invoked for verification of newly introduced changes, therewith amounting to a time-consuming test-last approach. Test-Driven Development (TDD) for ontology authoring is a recent test-first approach that aims to reduce authoring time and increase authoring efficiency. Current TDD testing falls short on coverage of OWL features and possible test outcomes, the rigorous foundation thereof, and evaluations to ascertain its effectiveness. We aim to address these issues in one instantiation of TDD for ontology authoring. We first propose a succinct, logic-based model of TDD testing and present novel TDD algorithms so as to cover also any OWL 2 class expression for the TBox and for the principal ABox assertions, and prove their correctness. The algorithms use methods from the OWL API directly such that reclassification is not necessary for test execution, therewith reducing ontology authoring time. The algorithms were implemented in TDDonto2, a Protégé plugin. TDDonto2 was evaluated on editing efficiency and by users. The editing efficiency study demonstrated that it is faster than a typical ontology authoring interface, especially for medium size and large ontologies. The user evaluation demonstrated that modellers make significantly less errors with TDDonto2 compared to the standard Protégé interface and complete their tasks better using less time. Thus, the results indicate that Test-Driven Development is a promising approach in an ontology development methodology. Keywords:Ontology Engineering, Test-Driven Development, OWL 1. Introduction Ontology engineering is facilitated by methods and methodologies, andtooling support for them. The methodologies are mostly information system-like, high-level directions, such as variants on waterfall and lifecycle development [1, 2], although more recently, notions of Agile development are being ported to the ontology development setting, e.g., [3, 4], including testing in some form [5, 6, 7, 8].

Improved Knowledge Graph Embedding using Background Taxonomic Information

arXiv.org Machine Learning

Knowledge graphs are used to represent relational information in terms of triples. To enable learning about domains, embedding models, such as tensor factorization models, can be used to make predictions of new triples. Often there is background taxonomic information (in terms of subclasses and subproperties) that should also be taken into account. We show that existing fully expressive (a.k.a. universal) models cannot provably respect subclass and subproperty information. We show that minimal modifications to an existing knowledge graph completion method enables injection of taxonomic information. Moreover, we prove that our model is fully expressive, assuming a lower-bound on the size of the embeddings. Experimental results on public knowledge graphs show that despite its simplicity our approach is surprisingly effective.

Efficient Concept Induction for Description Logics

arXiv.org Artificial Intelligence

Concept Induction refers to the problem of creating complex Description Logic class descriptions (i.e., TBox axioms) from instance examples (i.e., ABox data). In this paper we look particularly at the case where both a set of positive and a set of negative instances are given, and complex class expressions are sought under which the positive but not the negative examples fall. Concept induction has found applications in ontology engineering, but existing algorithms have fundamental performance issues in some scenarios, mainly because a high number of invokations of an external Description Logic reasoner is usually required. In this paper we present a new algorithm for this problem which drastically reduces the number of reasoner invokations needed. While this comes at the expense of a more limited traversal of the search space, we show that our approach improves execution times by up to several orders of magnitude, while output correctness, measured in the amount of correct coverage of the input instances, remains reasonably high in many cases. Our approach thus should provide a strong alternative to existing systems, in particular in settings where other systems are prohibitively slow.

Project Rosetta: A Childhood Social, Emotional, and Behavioral Developmental Ontology

arXiv.org Artificial Intelligence

There is a wide array of existing instruments used to assess childhood behavior and development for the evaluation of social, emotional and behavioral disorders. Many of these instruments either focus on one diagnostic category or encompass a broad set of childhood behaviors. We built an extensive ontology of the questions associated with key features that have diagnostic relevance for child behavioral conditions, such as Autism Spectrum Disorder (ASD), attention-deficit/hyperactivity disorder (ADHD), and anxiety, by incorporating a subset of existing child behavioral instruments and categorizing each question into clinical domains. Each existing question and set of question responses were then mapped to a new unique Rosetta question and set of answer codes encompassing the semantic meaning and identified concept(s) of as many existing questions as possible. This resulted in 1274 existing instrument questions mapping to 209 Rosetta questions creating a minimal set of questions that are comprehensive of each topic and subtopic. This resulting ontology can be used to create more concise instruments across various ages and conditions, as well as create more robust overlapping datasets for both clinical and research use.

SWRL2SPIN: A tool for transforming SWRL rule bases in OWL ontologies to object-oriented SPIN rules

arXiv.org Artificial Intelligence

Semantic Web Rule Language (SWRL) combines OWL (Web Ontology Language) ontologies with Horn Logic rules of the Rule Markup Language (RuleML) family. Being supported by ontology editors, rule engines and ontology reasoners, it has become a very popular choice for developing rule-based applications on top of ontologies. However, SWRL is probably not go-ing to become a WWW Consortium standard, prohibiting industrial acceptance. On the other hand, SPIN (SPARQL Inferencing Notation) has become a de-facto industry standard to rep-resent SPARQL rules and constraints on Semantic Web models, building on the widespread acceptance of SPARQL (SPARQL Protocol and RDF Query Language). In this paper, we ar-gue that the life of existing SWRL rule-based ontology applications can be prolonged by con-verting them to SPIN. To this end, we have developed the SWRL2SPIN tool in Prolog that transforms SWRL rules into SPIN rules, considering the object-orientation of SPIN, i.e. linking rules to the appropriate ontology classes and optimizing them, as derived by analysing the rule conditions.

Ontology Matching Techniques: A Gold Standard Model

arXiv.org Artificial Intelligence

Typically an ontology matching technique is a combination of much different type of matchers operating at various abstraction levels such as structure, semantic, syntax, instance etc. An ontology matching technique which employs matchers at all possible abstraction levels is expected to give, in general, best results in terms of precision, recall and F-measure due to improvement in matching opportunities and if we discount efficiency issues which may improve with better computing resources such as parallel processing. A gold standard ontology matching model is derived from a model classification of ontology matching techniques. A suitable metric is also defined based on gold standard ontology matching model. A review of various ontology matching techniques specified in recent research papers in the area was undertaken to categorize an ontology matching technique as per newly proposed gold standard model and a metric value for the whole group was computed. The results of the above study support proposed gold standard ontology matching model.