Duplicates in data management are common and problematic. In this work, we present a translation of Datalog under bag semantics into a well-behaved extension of Datalog (the so-called warded Datalog+-) under set semantics. From a theoretical point of view, this allows us to reason on bag semantics by making use of the well-established theoretical foundations of set semantics. From a practical point of view, this allows us to handle the bag semantics of Datalog by powerful, existing query engines for the required extension of Datalog. Moreover, this translation has the potential for further extensions -- above all to capture the bag semantics of the semantic web query language SPARQL.
The advent of the Web of Data kindled interest in link-traversal (or lookup-based) query processing methods, with which queries are answered via dereferencing a potentially large number of small, interlinked sources. While several algorithms for query evaluation have been proposed, there exists no notion of completeness for results of so-evaluated queries. In this paper, we motivate the need for clearly-defined completeness classes and present several notions of completeness for queries over Linked Data, based on the idea of authoritativeness of sources, and show the relation between the different completeness classes.
Conversations in social media often contain the use of irony or sarcasm, when the users say the opposite of what they really mean. Irony markers are the meta-communicative clues that inform the reader that an utterance is ironic. We propose a thorough analysis of theoretically grounded irony markers in two social media platforms: Twitter and Reddit. Classification and frequency analysis shows that for Twitter, typographic markers such as emoticons and emojis are the most discriminative markers to recognize ironic utterances, while for Reddit the morphological markers (e.g., interjections, tag questions) are the most discriminative.
In my previous blog, I showed you how to integrate Stanford CoreNLP with Talend using a simple example. In this post I'll show you how to modify that code in order to make the most of Talend's strengths as a data integration tool. Below is a Talend job I have built to read some tweets from a database (see this blog article for information on how to retrieve tweets with Talend), run the text through the CoreNLP sentiment analysis code, and then write tweets back to the database with the addition of the sentiment. In this particular example, the text to be analysed are tweets coming from a database. However, the same job will work with any string input.
In the right architecture, machine-learning functionality takes data analytics to the next level of value. Editor's note: This guest post (translated from Italian and originally published in late 2016) by Lorenzo Ridi, of Google Cloud Platform partner Noovle of Italy, describes a POC for building an end-to-end analytic pipeline on GCP that includes machine-learning functionality. "Black Friday" is traditionally the biggest shopping day of the year in the United States. Black Friday can be a great opportunity to promote products, raise brand awareness and kick-off the holiday shopping season with a bang. During that period, whatever the type of retail involved, it's also becoming increasingly important to monitor and respond to consumer sentiment and feedback across social media channels.