Datalog: Bag Semantics via Set Semantics Artificial Intelligence

Duplicates in data management are common and problematic. In this work, we present a translation of Datalog under bag semantics into a well-behaved extension of Datalog (the so-called warded Datalog+-) under set semantics. From a theoretical point of view, this allows us to reason on bag semantics by making use of the well-established theoretical foundations of set semantics. From a practical point of view, this allows us to handle the bag semantics of Datalog by powerful, existing query engines for the required extension of Datalog. Moreover, this translation has the potential for further extensions -- above all to capture the bag semantics of the semantic web query language SPARQL.

On Completeness Classes for Query Evaluation on Linked Data

AAAI Conferences

The advent of the Web of Data kindled interest in link-traversal (or lookup-based) query processing methods, with which queries are answered via dereferencing a potentially large number of small, interlinked sources. While several algorithms for query evaluation have been proposed, there exists no notion of completeness for results of so-evaluated queries. In this paper, we motivate the need for clearly-defined completeness classes and present several notions of completeness for queries over Linked Data, based on the idea of authoritativeness of sources, and show the relation between the different completeness classes.

"With 1 Follower I Must Be AWESOME :P." Exploring the Role of Irony Markers in Irony Recognition

AAAI Conferences

Conversations in social media often contain the use of irony or sarcasm, when the users say the opposite of what they really mean. Irony markers are the meta-communicative clues that inform the reader that an utterance is ironic. We propose a thorough analysis of theoretically grounded irony markers in two social media platforms: Twitter and Reddit. Classification and frequency analysis shows that for Twitter, typographic markers such as emoticons and emojis are the most discriminative markers to recognize ironic utterances, while for Reddit the morphological markers (e.g., interjections, tag questions) are the most discriminative.

Sentiment Analysis with Talend & Stanford CoreNLP Datalytyx


In my previous blog, I showed you how to integrate Stanford CoreNLP with Talend using a simple example. In this post I'll show you how to modify that code in order to make the most of Talend's strengths as a data integration tool. Below is a Talend job I have built to read some tweets from a database (see this blog article for information on how to retrieve tweets with Talend), run the text through the CoreNLP sentiment analysis code, and then write tweets back to the database with the addition of the sentiment. In this particular example, the text to be analysed are tweets coming from a database. However, the same job will work with any string input.