Duplicates in data management are common and problematic. In this work, we present a translation of Datalog under bag semantics into a well-behaved extension of Datalog (the so-called warded Datalog+-) under set semantics. From a theoretical point of view, this allows us to reason on bag semantics by making use of the well-established theoretical foundations of set semantics. From a practical point of view, this allows us to handle the bag semantics of Datalog by powerful, existing query engines for the required extension of Datalog. Moreover, this translation has the potential for further extensions -- above all to capture the bag semantics of the semantic web query language SPARQL.
The advent of the Web of Data kindled interest in link-traversal (or lookup-based) query processing methods, with which queries are answered via dereferencing a potentially large number of small, interlinked sources. While several algorithms for query evaluation have been proposed, there exists no notion of completeness for results of so-evaluated queries. In this paper, we motivate the need for clearly-defined completeness classes and present several notions of completeness for queries over Linked Data, based on the idea of authoritativeness of sources, and show the relation between the different completeness classes.
Conversations in social media often contain the use of irony or sarcasm, when the users say the opposite of what they really mean. Irony markers are the meta-communicative clues that inform the reader that an utterance is ironic. We propose a thorough analysis of theoretically grounded irony markers in two social media platforms: Twitter and Reddit. Classification and frequency analysis shows that for Twitter, typographic markers such as emoticons and emojis are the most discriminative markers to recognize ironic utterances, while for Reddit the morphological markers (e.g., interjections, tag questions) are the most discriminative.
In my previous blog, I showed you how to integrate Stanford CoreNLP with Talend using a simple example. In this post I'll show you how to modify that code in order to make the most of Talend's strengths as a data integration tool. Below is a Talend job I have built to read some tweets from a database (see this blog article for information on how to retrieve tweets with Talend), run the text through the CoreNLP sentiment analysis code, and then write tweets back to the database with the addition of the sentiment. In this particular example, the text to be analysed are tweets coming from a database. However, the same job will work with any string input.
This track includes data-related tasks such as analysis, capture, curation, search, sharing, storage, transfer, visualization, and information privacy, with special focus on social data on the web. Hence, the broader context of the track comprehends AI, web mining, information retrieval, natural language processing, and sentiment analysis. As the web rapidly evolves, web users are evolving with it. In an era of social connectedness, people are becoming increasingly enthusiastic about interacting, sharing, and collaborating through social networks, online communities, blogs, wikis, and other online collaborative media. In recent years, this collective intelligence has spread to many different areas, with particular focus on fields related to everyday life such as commerce, tourism, education, and health, causing the size of the social web to expand exponentially. The distillation of knowledge from such a large amount of unstructured information, however, is an extremely difficult task, as the contents of today’s web are perfectly suitable for human consumption, but remain hardly accessible to machines. The opportunity to capture the opinions of the general public about social events, political movements, company strategies, marketing campaigns, and product preferences has raised growing interest both within the scientific community, leading to many exciting open challenges, as well as in the business world, due to the remarkable benefits to be had from marketing and financial market prediction. The primary aim of this track is exploring the new frontiers of big data computing for opinion mining and sentiment analysis through machine learning techniques, knowledge-based systems, adaptive and transfer learning, in order to more efficiently retrieve and extract social information from the web.