Expert Systems
Loop Restricted Existential Rules and First-order Rewritability for Query Answering
Asuncion, Vernon, Zhang, Yan, Zhang, Heng
In ontology-based data access (OBDA), the classical database is enhanced with an ontology in the form of logical assertions generating new intensional knowledge. A powerful form of such logical assertions is the tuple-generating dependencies (TGDs), also called existential rules, where Horn rules are extended by allowing existential quantifiers to appear in the rule heads. In this paper we introduce a new language called loop restricted (LR) TGDs (existential rules), which are TGDs with certain restrictions on the loops embedded in the underlying rule set. We study the complexity of this new language. We show that the conjunctive query answering (CQA) under the LR TGDs is decid- able. In particular, we prove that this language satisfies the so-called bounded derivation-depth prop- erty (BDDP), which implies that the CQA is first-order rewritable, and its data complexity is in AC0 . We also prove that the combined complexity of the CQA is EXPTIME complete, while the language membership is PSPACE complete. Then we extend the LR TGDs language to the generalised loop restricted (GLR) TGDs language, and prove that this class of TGDs still remains to be first-order rewritable and properly contains most of other first-order rewritable TGDs classes discovered in the literature so far.
A New Decidable Class of Tuple Generating Dependencies: The Triangularly-Guarded Class
In this paper we introduce a new class of tuple-generating dependencies (TGDs) called triangularly-guarded TGDs, which are TGDs with certain restrictions on the atomic derivation track embedded in the underlying rule set. We show that conjunctive query answering under this new class of TGDs is decidable. We further show that this new class strictly contains some other decidable classes such as weak-acyclic, guarded, sticky and shy, which, to the best of our knowledge, provides a unified representation of all these aforementioned classes.
A Parallel/Distributed Algorithmic Framework for Mining All Quantitative Association Rules
Christou, Ioannis T., Amolochitis, Emmanouil, Tan, Zheng-Hua
We present QARMA, an efficient novel parallel algorithm for mining all Quantitative Association Rules in large multidimensional datasets where items are required to have at least a single common attribute to be specified in the rules single consequent item. Given a minimum support level and a set of threshold criteria of interestingness measures such as confidence, conviction etc. our algorithm guarantees the generation of all non-dominated Quantitative Association Rules that meet the minimum support and interestingness requirements. Such rules can be of great importance to marketing departments seeking to optimize targeted campaigns, or general market segmentation. They can also be of value in medical applications, financial as well as predictive maintenance domains. We provide computational results showing the scalability of our algorithm, and its capability to produce all rules to be found in large scale synthetic and real world datasets such as Movie Lens, within a few seconds or minutes of computational time on commodity hardware.
Corpus-Level Fine-Grained Entity Typing
Yaghoobzadeh, Yadollah, Adel, Heike, Schuetze, Hinrich
Extracting information about entities remains an important research area. This paper addresses the problem of corpus-level entity typing, i.e., inferring from a large corpus that an entity is a member of a class, such as "food" or "artist". The application of entity typing we are interested in is knowledge base completion, specifically, to learn which classes an entity is a member of. We propose FIGMENT to tackle this problem. FIGMENT is embedding-based and combines (i) a global model that computes scores based on global information of an entity and (ii) a context model that first evaluates the individual occurrences of an entity and then aggregates the scores. Each of the two proposed models has specific properties. For the global model, learning high-quality entity representations is crucial because it is the only source used for the predictions. Therefore, we introduce representations using the name and contexts of entities on the three levels of entity, word, and character. We show that each level provides complementary information and a multi-level representation performs best. For the context model, we need to use distant supervision since there are no context-level labels available for entities. Distantly supervised labels are noisy and this harms the performance of models. Therefore, we introduce and apply new algorithms for noise mitigation using multi-instance learning. We show the effectiveness of our models on a large entity typing dataset built from Freebase.
Deep Transfer Network with Joint Distribution Adaptation: A New Intelligent Fault Diagnosis Framework for Industry Application
Han, Te, Liu, Chao, Yang, Wenguang, Jiang, Dongxiang
In recent years, an increasing popularity of deep learning model for intelligent condition monitoring and diagnosis as well as prognostics used for mechanical systems and structures has been observed. In the previous studies, however, a major assumption accepted by default, is that the training and testing data are taking from same feature distribution. Unfortunately, this assumption is mostly invalid in real application, resulting in a certain lack of applicability for the traditional diagnosis approaches. Inspired by the idea of transfer learning that leverages the knowledge learnt from rich labeled data in source domain to facilitate diagnosing a new but similar target task, a new intelligent fault diagnosis framework, i.e., deep transfer network (DTN), which generalizes deep learning model to domain adaptation scenario, is proposed in this paper. By extending the marginal distribution adaptation (MDA) to joint distribution adaptation (JDA), the proposed framework can exploit the discrimination structures associated with the labeled data in source domain to adapt the conditional distribution of unlabeled target data, and thus guarantee a more accurate distribution matching. Extensive empirical evaluations on three fault datasets validate the applicability and practicability of DTN, while achieving many state-of-the-art transfer results in terms of diverse operating conditions, fault severities and fault types.
CERES: Distantly Supervised Relation Extraction from the Semi-Structured Web
Lockard, Colin, Dong, Xin Luna, Einolghozati, Arash, Shiralkar, Prashant
The web contains countless semi-structured websites, which can be a rich source of information for populating knowledge bases. Existing methods for extracting relations from the DOM trees of semi-structured webpages can achieve high precision and recall only when manual annotations for each website are available. Although there have been efforts to learn extractors from automatically-generated labels, these methods are not sufficiently robust to succeed in settings with complex schemas and information-rich websites. In this paper we present a new method for automatic extraction from semi-structured websites based on distant supervision. We automatically generate training labels by aligning an existing knowledge base with a web page and leveraging the unique structural characteristics of semi-structured websites. We then train a classifier based on the potentially noisy and incomplete labels to predict new relation instances. Our method can compete with annotation-based techniques in the literature in terms of extraction quality. A large-scale experiment on over 400,000 pages from dozens of multi-lingual long-tail websites harvested 1.25 million facts at a precision of 90%.
Solving Bongard Problems with a Visual Language and Pragmatic Reasoning
Depeweg, Stefan, Rothkopf, Constantin A., Jรคkel, Frank
More than 50 years ago Bongard introduced 100 visual concept learning problems as a testbed for intelligent vision systems. These problems are now known as Bongard problems. Although they are well known in the cognitive science and AI communities only moderate progress has been made towards building systems that can solve a substantial subset of them. In the system presented here, visual features are extracted through image processing and then translated into a symbolic visual vocabulary. We introduce a formal language that allows representing complex visual concepts based on this vocabulary. Using this language and Bayesian inference, complex visual concepts can be induced from the examples that are provided in each Bongard problem. Contrary to other concept learning problems the examples from which concepts are induced are not random in Bongard problems, instead they are carefully chosen to communicate the concept, hence requiring pragmatic reasoning. Taking pragmatic reasoning into account we find good agreement between the concepts with high posterior probability and the solutions formulated by Bongard himself. While this approach is far from solving all Bongard problems, it solves the biggest fraction yet.
Towards Collaborative Conceptual Exploration
In domains with high knowledge distribution a natural objective is to create principle foundations for collaborative interactive learning environments. We present a first mathematical characterization of a collaborative learning group, a consortium, based on closure systems of attribute sets and the well-known attribute exploration algorithm from formal concept analysis. To this end, we introduce (weak) local experts for subdomains of a given knowledge domain. These entities are able to refute and potentially accept a given (implicational) query for some closure system that is a restriction of the whole domain. On this we build up a consortial expert and show first insights about the ability of such an expert to answer queries. Furthermore, we depict techniques on how to cope with falsely accepted implications and on combining counterexamples. Using notions from combinatorial design theory we further expand those insights as far as providing first results on the decidability problem if a given consortium is able to explore some target domain. Applications in conceptual knowledge acquisition as well as in collaborative interactive ontology learning are at hand.
Expeditious Generation of Knowledge Graph Embeddings
Soru, Tommaso, Ruberto, Stefano, Moussallem, Diego, Marx, Edgard, Esteves, Diego, Ngomo, Axel-Cyrille Ngonga
Knowledge Graph Embedding methods aim at representing entities and relations in a knowledge base as points or vectors in a continuous vector space. Several approaches using embeddings have shown promising results on tasks such as link prediction, entity recommendation, question answering, and triplet classification. However, only a few methods can compute low-dimensional embeddings of very large knowledge bases. In this paper, we propose KG2Vec, a novel approach to Knowledge Graph Embedding based on the skip-gram model. Instead of using a predefined scoring function, we learn it relying on Long Short-Term Memories. We evaluated the goodness of our embeddings on knowledge graph completion and show that KG2Vec is comparable to the quality of the scalable state-of-the-art approaches and can process large graphs by parsing more than a hundred million triples in less than 6 hours on common hardware.
Artificial Intelligence (AI), Healthcare and Regulatory Compliance
The media is replete with articles about how artificial intelligence (AI) is going to change the medical world, in cancer detection and other diagnostic and treatment disciplines. The articles describe how AI, primarily deep learning (DL) applications are as accurate or better than medical experts. That means they'll be used quickly adopted, right? Not really, there's a regulatory picture many ignore. One of the first expert systems, a subset of AI, was MYCIN, initially developed as a doctoral dissertation by Edward Shortliffe, at Stanford University.