Automatically Conceptualizing Social Media Analytics Data via Personas

AAAI Conferences

Social media analytics is insightful but can also be difficult to use within organizations. To address this, we present Automatic Persona Generation (APG), a system and methodology for quantitatively generating personas using large amounts of online social media data. The APG system is operational, deployed in a pilot version with several organizations in multiple industry verticals. APG uses a robust web and stable back-end database framework to process tens of millions of user interactions with thousands of online digital products on multiple social media platforms, including Facebook and YouTube. APG identifies both distinct and impactful audience segments for an organization to create persona profiles by enhancing the social media analytics data with pertinent features, such as names, photos, interests, etc. We demonstrate the architecture development, and main system features. APG provides value for organizations distributing content via online platforms and is unique in its approach to leveraging social media data for audience understanding. APG is online at https://persona.qcri.org.


Characterizing the Demographics Behind the #BlackLivesMatter Movement

AAAI Conferences

The debates on minority issues are often dominated by or held among the concerned minorities: gender equality debates have often failed to engage men, while those about race fail to engage the dominant group. To test this observation, we study the #BlackLivesMatter movement and hashtag on Twitter--that has emerged and gained traction after a series of events typically involving the death of African-Americans as a result of police brutality--aiming to quantify the population biases across user types (individuals vs. organizations), and (for individuals) across 3 demographics factors (race, gender and age). Our results suggest that more African-Americans engage with the hashtag, and that they are also more active than other demographic groups. We also discuss ethical caveats with broader implications for studies on sensitive topics (e.g. mental health or religion) that focus on users.


Inferring Latent User Properties from Texts Published in Social Media

AAAI Conferences

We demonstrate an approach to predict latent personal attributes including user demographics, online personality, emotions and sentiments from texts published on Twitter. We rely on machine learning and natural language processing techniques to learn models from user communications. We first examine individual tweets to detect emotions and opinions emanating from them, and then analyze all the tweets published by a user to infer latent traits of that individual. We consider various user properties including age, gender, income, education, relationship status, optimism and life satisfaction. We focus on Ekman’s six emotions: anger, joy, surprise, fear, disgust and sadness. Our work can help social network users to understand how others may perceive them based on how they communicate in social media, in addition to its evident applications in online sales and marketing, targeted advertising, large scale polling and healthcare analytics.


Predicting the Demographics of Twitter Users from Website Traffic Data

AAAI Conferences

Understanding the demographics of users of online social networks has important applications for health, marketing, and public messaging. In this paper, we predict the demographics of Twitter users based on whom they follow. Whereas most prior approaches rely on a supervised learning approach, in which individual users are labeled with demographics, we instead create a distantly labeled dataset by collecting audience measurement data for 1,500 websites (e.g., 50% of visitors to gizmodo.com are estimated to have a bachelor's degree). We then fit a regression model to predict these demographics using information about the followers of each website on Twitter. The resulting average held-out correlation is .77 across six different variables (gender, age, ethnicity, education, income, and child status). We additionally validate the model on a smaller set of Twitter users labeled individually for ethnicity and gender, finding performance that is surprisingly competitive with a fully supervised approach.


A Comparative Study of Demographic Attribute Inference in Twitter

AAAI Conferences

Social media platforms have become a major gateway to receive and analyze public opinions. Understandingusers can provide invaluable context information of their social media posts and significantly improve traditional opinion analysis models. Demographic attributes,such as ethnicity, gender, age, among others,have been extensively applied to characterize social mediausers. While studies have shown that user groups formed by demographic attributes can have coherent opinions towards political issues, these attributes are often not explicitly coded by users through their profiles.Previous work has demonstrated the effectiveness of different user signals such as users’ posts and names in determining demographic attributes. Yet, these efforts mostly evaluate linguistic signals from users’ postsand train models from artificially balanced datasets. In this paper, we propose a comprehensive list of user signals:self-descriptions and posts aggregated from users’ friends and followers, users’ profile images, and users’ names.We provide a comparative study of these signalsside-by-side in the tasks on inferring three major demographic attributes, namely ethnicity, gender, and age.We utilize a realistic unbalanced datasets that share similar demographic makeups in Twitter for training modelsand evaluation experiments. Our experiments indicate that self-descriptions provide the strongest signal for ethnicity and age inference and clearly improve the overall performance when combined with tweets. Profile images for gender inference have the highest precision score with overall score close to the best result in our setting. This suggests that signals in self descriptions and profile images have potentials to facilitate demographic attribute inferences in Twitter, and are promising for future investigation.