Rule-Based Reasoning
Accelerating the Discovery of Data Quality Rules: A Case Study
Yeh, Peter Z. (Accenture) | Puri, Colin A. (Accenture) | Wagman, Mark (Accenture) | Easo, Ajay K (Accenture)
Poor quality data is a growing and costly problem that affects many enterprises across all aspects of their business ranging from operational efficiency to revenue protection. In this paper, we present an application -- Data Quality Rules Accelerator (DQRA) -- that accelerates Data Quality (DQ) efforts (e.g. data profiling and cleansing) by automatically discovering DQ rules for detecting inconsistencies in data. We then present two evaluations. The first evaluation compares DQRA to existing solutions; and shows that DQRA either outperformed or achieved performance comparable with these solutions on metrics such as precision, recall, and runtime. The second evaluation is a case study where DQRA was piloted at a large utilities company to improve data quality as part of a legacy migration effort. DQRA was able to discover rules that detected data inconsistencies directly impacting revenue and operational efficiency. Moreover, DQRA was able to significantly reduce the amount of effort required to develop these rules compared to the state of the practice. Finally, we describe ongoing efforts to deploy DQRA.
Extensible Automated Constraint Modelling
Akgun, Ozgur (University of St. Andrews) | Miguel, Ian (University of St. Andrews) | Jefferson, Chris (University of St. Andrews) | Frisch, Alan M. (University of York) | Hnich, Brahim (Izmir University of Economics)
In constraint solving, a critical bottleneck is the formulation of aneffective constraint model of an input problem. The Conjure system describedin this paper, a substantial step forward over prototype versions of Conjurepreviously reported, makes a valuable contribution to the automation ofconstraint modelling by automatically producing constraint models from theirspecifications in the abstract constraint specification language Essence. Aset of rules is used to refine an abstract specification into a concreteconstraint model. We demonstrate that this set of rules is readily extensibleto increase the space of possible constraint models Conjure can produce. Ourempirical results confirm that Conjure can reproduce successfully the kernelsof the constraint models of 32 benchmark problems found in the literature.
WikiSimple: Automatic Simplification of Wikipedia Articles
Woodsend, Kristian (University of Edinburgh) | Lapata, Mirella (University of Edinburgh)
Text simplification aims to rewrite text into simpler versions and thus make information accessible to a broader audience (e.g., non-native speakers, children, and individuals with language impairments). In this paper, we propose a model that simplifies documents automatically while selecting their most important content and rewriting them in a simpler style. We learn content selection rules from same-topic Wikipedia articles written in the main encyclopedia and its Simple English variant. We also use the revision histories of Simple Wikipedia articles to learn a quasi-synchronous grammar of simplification rewrite rules. Based on an integer linear programming formulation, we develop a joint model where preferences based on content and style are optimized simultaneously. Experiments on simplifying main Wikipedia articles show that our method significantly reduces the reading difficulty, while still capturing the important content.
DISCO: Describing Images Using Scene Contexts and Objects
Nwogu, Ifeoma (University of Rochester) | Zhou, Yingbo (University at Buffalo, State University of New York) | Brown, Christopher (University of Rochester)
In this paper, we propose a bottom-up approach to generating short descriptive sentences from images, to enhance scene understanding. We demonstrate automatic methods for mapping the visual content in an image to natural spoken or written language. We also introduce a human-in-the-loop evaluation strategy that quantitatively captures the meaningfulness of the generated sentences. We recorded a correctness rate of 60.34% when human users were asked to judge the meaningfulness of the sentences generated from relatively challenging images. Also, our automatic methods compared well with the state-of-the-art techniques for the related computer vision tasks.
Reasoning and Proofing Services for Semantic Web Agents
Kravari, Kalliopi (Aristotle University of Thessaloniki) | Papatheodorou, Konstantinos (Institute of Computer Science and University of Crete) | Antoniou, Grigoris (Institute of Computer Science and University of Crete) | Bassiliades, Nick (Aristotle University of Thessaloniki)
The Semantic Web aims to offer an interoperable environment that will allow users to safely delegate complex actions to intelligent agents. Much work has been done for agents' interoperability; especially in the areas of ontology-based metadata and rule-based reasoning. Nevertheless, the SW proof layer has been neglected so far, although it is vital for agents and humans to understand how a result came about, in order to increase the trust in the interchanged information. This paper focuses on the implementation of third party SW reasoning and proofing services wrapped as agents in a multi-agent framework. This way, agents can exchange and justify their arguments without the need to conform to a common rule paradigm. Via external reasoning and proofing services, the receiving agent can grasp the semantics of the received rule set and check the validity of the inferred results.
Fusion of Multiple Features and Supervised Learning for Chinese OOV Term Detection and POS Guessing
Zhang, Yuejie (Fudan University) | Cen, Lei (Fudan University) | Wu, Wei (Fudan University) | Jin, Cheng (Fudan University) | Xue, Xiangyang (Fudan University)
In this paper, to support more precise Chinese Out-of-Vocabulary (OOV) term detection and Part-of-Speech (POS) guessing, a unified mechanism is proposed and formulated based on the fusion of multiple features and supervised learning. Besides all the traditional features, the new features for statistical information and global contexts are introduced, as well as some constraints and heuristic rules, which reveal the relationships among OOV term candidates. Our experiments on the Chinese corpora from both People’s Daily and SIGHAN 2005 have achieved the consistent results, which are better than those acquired by pure rule-based or statistics-based models. From the experimental results for combining our model with Chinese monolingual retrieval on the data sets of TREC-9, it is found that the obvious improvement for the retrieval performance can also be obtained.
A Graph-Based Algorithm for Inducing Lexical Taxonomies from Scratch
Navigli, Roberto (Sapienza University of Rome) | Velardi, Paola (Sapienza University of Rome) | Faralli, Stefano (Sapienza University of Rome)
In this paper we present a graph-based approach aimed at learning a lexical taxonomy automatically starting from a domain corpus and the Web. Unlike many taxonomy learning approaches in the literature, our novel algorithm learns both concepts and relations entirely from scratch via the automated extraction of terms, definitions and hypernyms. This results in a very dense, cyclic and possibly disconnected hypernym graph. The algorithm then induces a taxonomy from the graph. Our experiments show that we obtain high-quality results, both when building brand-new taxonomies and when reconstructing WordNet sub-hierarchies.
Heuristic Rule-Based Regression Via Dynamic Reduction to Classification
Janssen, Frederik (Technical University, Darmstadt) | Fürnkranz, Johannes (Technical University, Darmstadt)
In this paper, we propose a novel approach for learning regression rules by transforming the regression problem into a classification problem. Unlike previous approaches to regression by classification, in our approach the discretization of the class variable is tightly integrated into the rule learning algorithm. The key idea is to dynamically define a region around the target value predicted by the rule, and considering all examples within that region as positive and all examples outside that region as negative. In this way, conventional rule learning heuristics may be used for inducing regression rules. Our results show that our heuristic algorithm outperforms approaches that use a static discretization of the target variable, and performs en par with other comparable rule-based approaches, albeit without reaching the performance of statistical approaches.
Extending Decidable Existential Rules by Joining Acyclicity and Guardedness
Krötzsch, Markus (University of Oxford) | Rudolph, Sebastian (Karlsruhe Institute of Technology)
Existential rules, i.e. Datalog extended with existential quantifiers in rule heads, are currently studied under a variety of names such as Datalog +/-, ∀∃-rules, and tuple-generating dependencies. The renewed interest in this formalism is fuelled by a wealth of recently discovered language fragments for which query answering is decidable. This paper extends and consolidates two of the main approaches in this field — acyclicity and guardedness — by providing (1) complexity-preserving generalisations of weakly acyclic and weakly (frontier-)guarded rules, and (2) a novel formalism of glut-(frontier-)guarded rules that subsumes both. This builds on an insight that acyclicity can be used to extend any existential rule language while retaining decidability. Besides decidability, combined query complexities are established in all cases.
Walking the Complexity Lines for Generalized Guarded Existential Rules
Baget, Jean-François (INRIA) | Mugnier, Marie-Laure (University of Montpellier 2) | Rudolph, Sebastian (KIT) | Thomazo, Michaël (University of Montpellier 2)
We establish complexities of the conjunctive query entailment problem for classes of existential rules (i.e. Tuple-Generating Dependencies or Datalog+/- rules). Our contribution is twofold. First, we introduce the class of greedy bounded treewidth sets (gbts), which covers guarded rules, and their known generalizations, namely (weakly) frontier-guarded rules. We provide a generic algorithm for query entailment with gbts, which is worst-case optimal for combined complexity with bounded predicate arity, as well as for data complexity. Second, we classify several gbts classes, whose complexity was unknown, namely frontier-one, frontier-guarded and weakly frontier-guarded rules, with respect to combined complexity (with bounded and unbounded predicate arity) and data complexity.