Dredze, Mark

Examining Patterns of Influenza Vaccination in Social Media

AAAI Conferences

Traditional data on influenza vaccination has several limitations: high cost, limited coverage of underrepresented groups, and low sensitivity to emerging public health issues. Social media, such as Twitter, provide an alternative way to understand a population’s vaccination-related opinions and behaviors. In this study, we build and employ several natural language classifiers to examine and analyze behavioral patterns regarding influenza vaccination in Twitter across three dimensions: temporality (by week and month), geography (by US region), and demography (by gender). Our best results are highly correlated official government data, with a correlation over 0.90, providing validation of our approach. We then suggest a number of directions for future work.

Twitter as a Source of Global Mobility Patterns for Social Good

arXiv.org Machine Learning

Data on human spatial distribution and movement is essential for understanding and analyzing social systems. However existing sources for this data are lacking in various ways; difficult to access, biased, have poor geographical or temporal resolution, or are significantly delayed. In this paper, we describe how geolocation data from Twitter can be used to estimate global mobility patterns and address these shortcomings. These findings will inform how this novel data source can be harnessed to address humanitarian and development efforts.

Collective Supervision of Topic Models for Predicting Surveys with Social Media

AAAI Conferences

This paper considers survey prediction from social media. We use topic models to correlate social media messages with survey outcomes and to provide an interpretable representation of the data. Rather than rely on fully unsupervised topic models, we use existing aggregated survey data to inform the inferred topics, a class of topic model supervision referred to as collective supervision. We introduce and explore a variety of topic model variants and provide an empirical analysis, with conclusions of the most effective models for this task.

Studying Anonymous Health Issues and Substance Use on College Campuses with Yik Yak

AAAI Conferences

This study investigates the public health intelligence utility of Yik Yak, a social media platform that allows users to anonymously post and view messages within precise geographic locations. Our dataset contains 122,179 “yaks” collected from 120 college campuses across the United States during 2015. We first present an exploratory analysis of the topics commonly discussed in Yik Yak, clarifying the health issues for which this may serve as a source of information. We then present an in-depth content analysis of data describing substance use, an important public health issue that is not often discussed in public social media, but commonly discussed on Yik Yak under the cloak of anonymity.

Worldwide Influenza Surveillance through Twitter

AAAI Conferences

We evaluate the performance of Twitter-based influenza surveillance in ten English-speaking countries across four continents. We find that tweets are positively correlated with existing surveillance data provided by government agencies in these countries, with r values ranging from .37–.81. We show that incorporating Twitter data into a strong autoregressive baseline reduces mean squared error in 80 to 100 percent of locations depending on the lag, with larger improvements when reporting delays are longer.

Exploring Health Topics in Chinese Social Media: An Analysis of Sina Weibo

AAAI Conferences

This paper seeks to identify and characterize health-related topics discussed on the Chinese microblogging website, Sina Weibo. We identified nearly 1 million messages containing health-related keywords, filtered from a dataset of 93 million messages spanning five years. We applied probabilistic topic models to this dataset and identified the prominent health topics. We show that a variety of health topics are discussed in Sina Weibo, and that four flu-related topics are correlated with monthly influenza case rates in China.

Measuring Post Traumatic Stress Disorder in Twitter

AAAI Conferences

Traditional mental health studies rely on information primarily collected through personal contact with a health care professional. Recent work has shown the utility of social media data for studying depression, but there have been limited evaluations of other mental health conditions. We consider post traumatic stress disorder (PTSD), a serious condition that affects millions worldwide, with especially high rates in military veterans. We also present a novel method to obtain a PTSD classifier for social media using simple searches of available Twitter data, a significant reduction in training data cost compared to previous work. We demonstrate its utility by examining differences in language use between PTSD and random individuals, building classifiers to separate these two groups and by detecting elevated rates of PTSD at and around U.S. military bases using our classifiers.

Carmen: A Twitter Geolocation System with Applications to Public Health

AAAI Conferences

Public health applications using social media often require accurate, broad-coverage location information. However, the standard information provided by social media APIs, such as Twitter, cover a limited number of messages. This paper presents Carmen, a geolocation system that can determine structured location information for messages provided by the Twitter API. Our system utilizes geocoding tools and a combination of automatic and manual alias resolution methods to infer location structures from GPS positions and user-provided profile data. We show that our system is accurate and covers many locations, and we demonstrate its utility for improving influenza surveillance.

What Affects Patient (Dis)satisfaction? Analyzing Online Doctor Ratings with a Joint Topic-Sentiment Model

AAAI Conferences

We analyze patient reviews of doctors using a novel probabilistic joint model of topic and sentiment based on factorial LDA (Paul and Dredze 2012). We leverage this model to exploit a small set of previously annotated reviews to automatically analyze the topics and sentiment latent in over 50,000 online reviews of physicians (and we make this dataset publicly available). The proposed model outperforms baseline models for this task with respect to model perplexity and sentiment classification. We report the most representative words with respect to positive and negative sentiment along three clinical aspects, thus complementing existing qualitative work exploring patient reviews of physicians.