Tanev, Hristo (Joint Research Centre, European Commission) | Ehrmann, Maud (Joint Research Centre, European Commission) | Piskorski, Jakub (Frontex) | Zavarella, Vanni (Joint Research Centre, European Commission)
We describe a simple IR approach for linking news about events, detected by an event extraction system, to messages from Twitter (tweets). In particular, we explore several methods for creating event-specific queries for Twitter and provide a quantitative and qualitative evaluation of the relevance and usefulness of the information obtained from the tweets. We showed that methods based on utilization of word co-occurrence clustering, domain-specific keywords and named entity recognition improve the performance with respect to a basic approach.
Dinakar, Karthik (Massachusetts Institute of Technology) | Jones, Birago (Massachusetts Institute of Technology) | Lieberman, Henry (Massachusetts Institute of Technology) | Picard, Rosalind (Massachusetts Institute of Technology) | Rose, Carolyn (Carnegie Mellon University) | Thoman, Matthew (Northeastern University) | Reichart, Roi (Massachusetts Institute of Technology)
Adolescent cyber-bullying on social networks is a phenomenon that has received widespread attention. Recent work by sociologists has examined this phenomenon under the larger context of teenage drama and it's manifestations on social networks. Tackling cyber-bullying involves two key components – automatic detection of possible cases, and interaction strategies that encourage reflection and emotional support. Key is showing distressed teenagers that they are not alone in their plight. Conventional topic spotting and document classification into labels like "dating" or "sports" are not enough to effectively match stories for this task. In this work, we examine a corpus of 5500 stories from distressed teenagers from a major youth social network. We combine Latent Dirichlet Allocation and human interpretation of its output using principles from sociolinguistics to extract high-level themes in the stories and use them to match new stories to similar ones. A user evaluation of the story matching shows that theme-based retrieval does a better job of finding relevant and effective stories for this application than conventional approaches.
This blog was originally featured on blog.aylien.com, a Text Analysis blog with tutorials, Data Visualisations and industry discussions. Our founder, Parsa Ghaffari, gave a talk recently on Natural Language Processing and Sentiment Analysis at the Science Gallery in Dublin. As part of the talk, he put together a nice little example of how you can transform your Google Spreadsheet into a powerful Text Analysis and Data Mining tool. In this case, he took a simple example of analyzing restaurant reviews from a popular review site but the same could be done for hotels, products, service offerings and so on. He wanted to show how easy it can be for data geeks and even the less technical marketers among us, to start analyzing text and gathering business insight from the reams of textual data online today.
Sentiment analysis research has predominantly been on English texts. Thus there exist many sentiment resources for English, but less so for other languages. Approaches to improve sentiment analysis in a resource-poor focus language include: (a) translate the focus language text into a resource-rich language such as English, and apply a powerful English sentiment analysis system on the text, and (b) translate resources such as sentiment labeled corpora and sentiment lexicons from English into the focus language, and use them as additional resources in the focus-language sentiment analysis system. In this paper we systematically examine both options. We use Arabic social media posts as stand-in for the focus language text. We show that sentiment analysis of English translations of Arabic texts produces competitive results, w.r.t. Arabic sentiment analysis. We show that Arabic sentiment analysis systems benefit from the use of automatically translated English sentiment lexicons. We also conduct manual annotation studies to examine why the sentiment of a translation is different from the sentiment of the source word or text. This is especially relevant for building better automatic translation systems. In the process, we create a state-of-the-art Arabic sentiment analysis system, a new dialectal Arabic sentiment lexicon, and the first Arabic-English parallel corpus that is independently annotated for sentiment by Arabic and English speakers.