Goto

Collaborating Authors

 Cate, Balder ten


On the Power and Limitations of Examples for Description Logic Concepts

arXiv.org Artificial Intelligence

We investigate the power soltera2 is a positive example for C, and of labeled examples for describing description-logic px10 and teslaY are negative examples for C concepts. Specifically, we systematically study the In fact, as it turns out, C is the only EL-concept (up to equivalence) existence and efficient computability of finite characterisations, that fits these three labeled examples. In other words, i.e., finite sets of labeled examples these three labeled examples "uniquely characterize" C within that uniquely characterize a single concept, for a the class of all EL-concepts. This shows that the above three wide variety of description logics between EL and labeled examples are a good choice of examples. Adding any ALCQI,both without an ontology and in the presence additional examples would be redundant. Note, however, that of a DL-Lite ontology. Finite characterisations this depends on the choice of description logic. For instance, are relevant for debugging purposes, and their existence the richer concept language ALC allows for other concept is a necessary condition for exact learnability expressions such as Bicycle Contains.Basket that also fit.


On the non-efficient PAC learnability of conjunctive queries

arXiv.org Artificial Intelligence

An efficient PAC algorithm is a (possibly randomized) polynomial-time algorithm that takes as input a set of examples Conjunctive queries (CQs) are an extensively studied drawn from an unknown probability distribution D database query language that plays a prominent role and labeled as positive/negative according to an unknown in database theory.


SAT-Based PAC Learning of Description Logic Concepts

arXiv.org Artificial Intelligence

We propose bounded fitting as a scheme for learning description logic concepts in the presence of ontologies. A main advantage is that the resulting learning algorithms come with theoretical guarantees regarding their generalization to unseen examples in the sense of PAC learning. We prove that, in contrast, several other natural learning algorithms fail to provide such guarantees. As a further contribution, we present the system SPELL which efficiently implements bounded fitting for the description logic $\mathcal{ELH}^r$ based on a SAT solver, and compare its performance to a state-of-the-art learner.


Conjunctive Queries: Unique Characterizations and Exact Learnability

arXiv.org Artificial Intelligence

We answer the question which conjunctive queries are uniquely characterized by polynomially many positive and negative examples, and how to construct such examples efficiently. As a consequence, we obtain a new efficient exact learning algorithm for a class of conjunctive queries. At the core of our contributions lie two new polynomial-time algorithms for constructing frontiers in the homomorphism lattice of finite structures. We also discuss implications for the unique characterizability and learnability of schema mappings and of description logic concepts.


Learning Multilingual Word Embeddings Using Image-Text Data

arXiv.org Artificial Intelligence

There has been significant interest recently in learning multilingual word embeddings -- in which semantically similar words across languages have similar embeddings. State-of-the-art approaches have relied on expensive labeled data, which is unavailable for low-resource languages, or have involved post-hoc unification of monolingual embeddings. In the present paper, we investigate the efficacy of multilingual embeddings learned from weakly-supervised image-text data. In particular, we propose methods for learning multilingual embeddings using image-text data, by enforcing similarity between the representations of the image and that of the text. Our experiments reveal that even without using any expensive labeled data, a bag-of-words-based embedding model trained on image-text data achieves performance comparable to the state-of-the-art on crosslingual semantic similarity tasks.


Declarative Statistical Modeling with Datalog

arXiv.org Artificial Intelligence

Formalisms for specifying statistical models, such as probabilistic-programming languages, typically consist of two components: a specification of a stochastic process (the prior), and a specification of observations that restrict the probability space to a conditional subspace (the posterior). Use cases of such formalisms include the development of algorithms in machine learning and artificial intelligence. We propose and investigate a declarative framework for specifying statistical models on top of a database, through an appropriate extension of Datalog. By virtue of extending Datalog, our framework offers a natural integration with the database, and has a robust declarative semantics. Our Datalog extension provides convenient mechanisms to include numerical probability functions; in particular, conclusions of rules may contain values drawn from such functions. The semantics of a program is a probability distribution over the possible outcomes of the input database with respect to the program; these outcomes are minimal solutions with respect to a related program with existentially quantified variables in conclusions. Observations are naturally incorporated by means of integrity constraints over the extensional and intensional relations. We focus on programs that use discrete numerical distributions, but even then the space of possible outcomes may be uncountable (as a solution can be infinite). We define a probability measure over possible outcomes by applying the known concept of cylinder sets to a probabilistic chase procedure. We show that the resulting semantics is robust under different chases. We also identify conditions guaranteeing that all possible outcomes are finite (and then the probability space is discrete). We argue that the framework we propose retains the purely declarative nature of Datalog, and allows for natural specifications of statistical models.