Discourse & Dialogue
Gaussian Process Topic Models
Agovic, Amrudin, Banerjee, Arindam
We introduce Gaussian Process Topic Models (GPTMs), a new family of topic models which can leverage a kernel among documents while extracting correlated topics. GPTMs can be considered a systematic generalization of the Correlated Topic Models (CTMs) using ideas from Gaussian Process (GP) based embedding. Since GPTMs work with both a topic covariance matrix and a document kernel matrix, learning GPTMs involves a novel component-solving a suitable Sylvester equation capturing both topic and document dependencies. The efficacy of GPTMs is demonstrated with experiments evaluating the quality of both topic modeling and embedding.
FoodMood: Measuring Global Food Sentiment One Tweet at a Time
Dixon, Natalie (Affect Lab Foundation) | Jakic, Bruno (AI Applied) | Lagerweij, Roderick (AI Applied) | Mooij, Mark (AI Applied) | Yudin, Ekaterina (Affect Lab Foundation)
Do Happy Meals really make us happy? Do salads make us blue? Is cake our comfort? FoodMood is an interactive data visualisation project that gives citizens a rare opportunity to engage and reflect, acknowledge, and understand the connection between emotion, obesity and food. The project explores the opportunities presented by the data-sharing world of today’s cities using global English-language tweets about food coupled with sentiment analysis. It aims to gain a better understanding of global food consumption patterns and its impact on the daily emotional well-being of people against the backdrop of country data such as Gross Domestic Product (GDP) and obesity levels. A key finding is that tweets can be used to find a relationship between certain foods, food sentiment and obesity levels in countries. Overall FoodMood shows a majority positive sentiment towards food. Other findings, although constantly evolving, indicate trends such as: globally meat enjoys a high sentiment rating and is often tweeted about; fast-food companies dominate the food consumption landscapes of most countries’ tweets although not all of them enjoy equal sentiment ratings across countries. Ultimately, FoodMood reveals a hidden layer of meaningful digital, social, and cultural data that provide a basis for further analysis.
Talk of the City: Our Tweets, Our Community Happiness
Quercia, Daniele (University of Cambridge) | Seaghdha, Diarmuid O (University of Cambridge) | Crowcroft, Jon (University of Cambridge)
The literature of urban sociology and that of psychology have separately established two relationships: the first has linked characteristics of a community to its residents’ well-being, the second has linked well-being of individuals to their use of words. No one has hitherto explored the potential transitive relationship - that between characteristics of a community and its residents' use of words. We test this relationship by performing three steps. We consider Twitter users in a variety of London census communities; extract the subject matter of their tweets using "topic models"; and study the relationship between topics and community socio-economic well-being. We find that certain topics are correlated (positively and negatively) with community deprivation. Users in more deprived community tweet about wedding parties, matters expressed in Spanish/Portuguese, and celebrity gossips. By contrast, those in less deprived communities tweet about vacations, professional use of social media, environmental issues, sports, and health issues. We finally show that monitoring the subject matter of tweets not only offers insights into community well-being, but it is also a reasonable way of predicting community deprivation scores.
Emotional Divergence Influences Information Spreading in Twitter
Pfitzner, Rene (ETH Zurich) | Garas, Antonios (ETH Zurich) | Schweitzer, Frank (ETH Zurich)
We analyze data about the micro-blogging site Twitter using sentiment extraction techniques. From an information perspective, Twitter users are involved mostly in two processes: information creation and subsequent distribution (tweeting), and pure information distribution (retweeting), with pronounced preference to the first. However a rather substantial fraction of tweets are retweeted. Here, we address the role of the sentiment expressed in tweets for their potential aftermath. We find that although the overall sentiment (polarity) does not influence the probability of a tweet to be retweeted, a new measure called "emotional divergence" does have an impact. In general, tweets with high emotional diversity have a better chance of being retweeted, hence influencing the distribution of information.
Tracking Sentiment and Topic Dynamics from Social Media
He, Yulan (The Open University) | Lin, Chenghua (The Open University ) | Gao, Wei (Qatar Foundation) | Wong, Kam-Fai (The Chinese University of Hong Kong)
We propose a dynamic joint sentiment-topic model (dJST) which allows the detection and tracking of views of current and recurrent interests and shifts in topic and sentiment. Both topic and sentiment dynamics are captured by assuming that the current sentiment-topic specific word distributions are generated according to the word distributions at previous epochs. We derive efficient online inference procedures to sequentially update the model with newly arrived data and show the effectiveness of our proposed model on the Mozilla add-on reviews crawled between 2007 and 2011.
Tweetin' in the Rain: Exploring Societal-Scale Effects of Weather on Mood
Hannak, Aniko (Northeastern University) | Anderson, Eric (Northeastern University) | Barrett, Lisa Feldman (Northeastern University) | Lehmann, Sune (Technical University of Denmark) | Mislove, Alan (Northeastern University) | Riedewald, Mirek (Northeastern University)
There has been significant recent interest in using the aggregate sentiment from social media sites to understand and predict real-world phenomena. However, the data from social media sites also offers a unique and — so far — unexplored opportunity to study the impact of external factors on aggregate sentiment, at the scale of a society. Using a Twitter-specific sentiment extraction methodology, we the explore patterns of sentiment present in a corpus of over 1.5 billion tweets. We focus primarily on the effect of the weather and time on aggregate sentiment, evaluating how clearly the well-known individual patterns translate into population-wide patterns. Using machine learning techniques on the Twitter corpus correlated with the weather at the time and location of the tweets, we find that aggregate sentiment follows distinct climate, temporal, and seasonal patterns.
Happy, Nervous or Surprised? Classification of Human Affective States in Social Media
Choudhury, Munmun De (Microsoft Research, Redmond) | Gamon, Michael (Microsoft Research, Redmond) | Counts, Scott (Microsoft Research, Redmond)
Sentiment classification has been a well-investigated research area in the computational linguistics community. However, most of the research is primarily focused on detecting simply the polarity in text, often needing extensive manual labeling of ground truth. Additionally, little attention has been directed towards a finer analysis of human moods and affective states. Motivated by research in psychology, we propose and develop a classifier of several human affective states in social media. Starting with about 200 moods, we utilize mechanical turk studies to derive naturalistic signals from posts shared on Twitter about a variety of affects of individuals. This dataset is then deployed in an affect classification task with promising results. Our findings indicate that different types of affect involve different emotional content and usage styles; hence the performance of the classifier on various affects can differ considerably.
Visualizing Topic Models
Chaney, Allison June-Barlow (Princeton University) | Blei, David M. (Princeton University)
Managing large collections of documents is an important problem for many areas of science, industry, and culture. Probabilistic topic modeling offers a promising solution. Topic modeling is an unsupervised machine learning method that learns the underlying themes in a large collection of otherwise unorganized documents. This discovered structure summarizes and organizes the documents. However, topic models are high-level statistical tools—a user must scrutinize numerical distributions to understand and explore their results. In this paper, we present a method for visualizing topic models. Our method creates a navigator of the documents, allowing users to explore the hidden structure that a topic model discovers. These browsing interfaces reveal meaningful patterns in a collection, helping end-users explore and understand its contents in new ways. We provide open source software of our method.
Coping with the Document Frequency Bias in Sentiment Classification
Rafrafi, Abdelhalim (University Pierre et Marie Curie) | Guigue, Vincent (University Pierre et Marie Curie) | Gallinari, Patrick (University Pierre et Marie Curie)
In this article, we study the polarity detection problem using linear supervised classifiers. We show the interest of penalizing the document frequencies in the regularization process to increase the accuracy. We propose a systematic comparison of different loss and regularization functions on this particular task using the Amazon dataset. Then, we evaluate our models according to three criteria: accuracy, sparsity and subjectivity. The subjectivity is measured by projecting our dictionary and optimized weight vector on the SentiWordNet lexicon. This original approach highlights a bias in the selection of the relevant terms during the regularization procedure: frequent terms are overweighted compared to their intrinsic subjectivities.We show that this bias appears whatever the chosen loss or regularization and on all datasets: it is closely link to the gradient descent technique. Penalizing the document frequency during the learning step enables us to improve significantly our performances. A lot of sentimental markers appear rarely and thus, are unappreciated by statistical learning algorithms. Explicitly boosting their influences leads to increasing the accuracy in the sentiment classification task.
Crossing Media Streams with Sentiment: Domain Adaptation in Blogs, Reviews and Twitter
Mejova, Yelena (The University of Iowa) | Srinivasan, Padmini (The University of Iowa)
Most sentiment analysis studies address classification of a single source of data such as reviews or blog posts. However, the multitude of social media sources available for text analysis lends itself naturally to domain adaptation. In this study, we create a dataset spanning three social media sources -- blogs, reviews, and Twitter -- and a set of 37 common topics. We first examine sentiments expressed in these three sources while controlling for the change in topic. Then using this multi-dimensional data we show that when classifying documents in one source (a target source), models trained on other sources of data can be as good as or even better than those trained on the target data. That is, we show that models trained on some social media sources are generalizable to others. All source adaptation models we implement show reviews and Twitter to be the best sources of training data. It is especially useful to know that models trained on Twitter data are generalizable, since, unlike reviews, Twitter is more topically diverse.