We examine the question of whether we can automatically classify the sentiment of individual tweets in Farsi, to determine their changing sentiments over time toward a number of trending political topics. Examining tweets in Farsi adds challenges such as the lack of a sentiment lexicon and part-of-speech taggers, frequent use of colloquial words, and unique orthography and morphology characteristics. We have collected over 1 million Tweets on political topics in the Farsi language, with an annotated data set of over 3,000 tweets. We find that an SVM classifier with Brown clustering for feature selection yields a median accuracy of 56% and accuracy as high as 70%. We use this classifier to track dynamic sentiment during a key period of Irans negotiations over its nuclear program.
Sentiment analysis is the automated process of understanding an opinion about a given subject from written or spoken language. In a world where we generate 2.5 quintillion bytes of data every day, sentiment analysis has become a key tool for making sense of that data. This has allowed companies to get key insights and automate all kind of processes. But… How does it work? What are the different approaches? What are its caveats and limitations? How can you use sentiment analysis in your business? Below, you'll find the answers to these questions and everything you need to know about sentiment analysis. No matter if you are an experienced data scientist a coder, a marketer, a product analyst, or if you're just getting started, this comprehensive guide is for you. How Does Sentiment Analysis Work? Sentiment Analysis also known as Opinion Mining is a field within Natural Language Processing (NLP) that builds systems that try to identify and extract opinions within text. Currently, sentiment analysis is a topic of great interest and development since it has many practical applications. Since publicly and privately available information over Internet is constantly growing, a large number of texts expressing opinions are available in review sites, forums, blogs, and social media. With the help of sentiment analysis systems, this unstructured information could be automatically transformed into structured data of public opinions about products, services, brands, politics, or any topic that people can express opinions about. This data can be very useful for commercial applications like marketing analysis, public relations, product reviews, net promoter scoring, product feedback, and customer service. Before going into further details, let's first give a definition of opinion. Text information can be broadly categorized into two main types: facts and opinions. Facts are objective expressions about something. Opinions are usually subjective expressions that describe people's sentiments, appraisals, and feelings toward a subject or topic. In an opinion, the entity the text talks about can be an object, its components, its aspects, its attributes, or its features.
Tanev, Hristo (Joint Research Centre, European Commission) | Ehrmann, Maud (Joint Research Centre, European Commission) | Piskorski, Jakub (Frontex) | Zavarella, Vanni (Joint Research Centre, European Commission)
We describe a simple IR approach for linking news about events, detected by an event extraction system, to messages from Twitter (tweets). In particular, we explore several methods for creating event-specific queries for Twitter and provide a quantitative and qualitative evaluation of the relevance and usefulness of the information obtained from the tweets. We showed that methods based on utilization of word co-occurrence clustering, domain-specific keywords and named entity recognition improve the performance with respect to a basic approach.
Google says it only records interactions with connected devices like the Google Home speaker when we use the "wake word," of "Hey, Google," or "OK, Google." But when using many of the Google smartphone apps with a microphone for voice search, or even Google on the desktop with voice commands, it can actually record every word you say to it – whether you use the wake word or not. The fine print is that you have to click on the microphone in the apps to communicate with Google. Once you do that, Google will start transcribing you, word for word, and storing your commands, in text and audio, as USA TODAY discovered in tests this week. This is similar to Google's monitoring of our keystrokes.
Zhang, Wei (Tsinghua University and Tsinghua National Laboratory for Information Science and Technology) | Wang, Jianyong (Tsinghua University and Tsinghua National Laboratory for Information Science and Technology)
User-item connected documents, such as customer reviews for specific items in online shopping website and user tips in location-based social networks, have become more and more prevalent recently. Inferring the topic distributions of user-item connected documents is beneficial for many applications, including document classification and summarization of users and items. While many different topic models have been proposed for modeling multiple text, most of them cannot account for the dual role of user-item connected documents (each document is related to one user and one item simultaneously) in topic distribution generation process. In this paper, we propose a novel probabilistic topic model called Prior-based Dual Additive Latent Dirichlet Allocation (PDA-LDA). It addresses the dual role of each document by associating its Dirichlet prior for topic distribution with user and item topic factors, which leads to a document-level asymmetric Dirichlet prior. In the experiments, we evaluate PDA-LDA on several real datasets and the results demonstrate that our model is effective in comparison to several other models, including held-out perplexity on modeling text and document classification application.