Goto

Collaborating Authors

 Industry


Hierarchical Bayesian Models for Latent Attribute Detection in Social Media

AAAI Conferences

We present several novel minimally-supervised models for detecting latent attributes of social media users, with a focus on ethnicity and gender. Previouswork on ethnicity detection has used coarse-grained widely separated classes of ethnicity and assumed the existence of large amounts of training data such as the US census, simplifying the problem. Instead, we examine content generated by users in addition to name morpho-phonemics to detect ethnicity and gender. Further, weaddress this problem in a challenging setting where the ethnicity classes are more fine grained -- ethnicity classes in Nigeria -- and with very limited training data.


“Dancing with the Stars,” NBA Games, Politics: An Exploration of Twitter Users’ Response to Events

AAAI Conferences

Microblogging services such as Twitter offer great opportunities for analyzing the reactions of a wide audience with respect to current events. In this paper, we explore the correlation between types of user engagement and events centered around celebrities (e.g., personal or professional events involving Actors, Musicians, Politicians, Athletes).


RT to Win! Predicting Message Propagation in Twitter

AAAI Conferences

Twitter is a very popular way for people to share information on a bewildering multitude of topics. Tweets are propagated using a variety of channels: by following users or lists, by searching or by retweeting. Of these vectors, retweeting is arguably the most effective, as it can potentially reach the most people, given its viral nature. A key task is predicting if a tweet will be retweeted, and solving this problem furthers our understanding of message propagation within large user communities. We carry out a human experiment on the task of deciding whether a tweet will be retweeted which shows that the task is possible, as human performance levels are much above chance. Using a machine learning approach based on the passive-aggressive algorithm, we are able to automatically predict retweets as well as humans. Analyzing the learned model, we find that performance is dominated by social features, but that tweet features add a substantial boost.


Connecting Mutually Influencing Bloggers

AAAI Conferences

The blogosphere shows the characteristics of a power law distribution where a small set of the bloggers (influentials) get the majority of readership and the vast majority receives little traffic. Blogger recommendation algorithms aim at finding influentials for recommendation, putting bloggers with limited readership at further disadvantage. These bloggers could benefit from mutual endorsement of each other with the eventual goal of forming strong local communities with broader readership. In this paper, we propose a recommendation algorithm to connect blogger pairs with the intent that once connected the bloggers would share a mutually influencing relationship between them. In particular, we compute bloggers' influence profile based on how much she influences her blog friends and recommend bloggers with similar influence profiles. We characterize bloggers into four different groups: global leaders, connectors, local leaders, isolates. Our result shows marginal benefit for isolates and significant benefit for local leaders. Our approach can be instructive in building intelligent recommendation engine for bloggers with limited readership to build strong local communities.


An Empirical Study of Geographic User Activity Patterns in Foursquare

AAAI Conferences

We present a large-scale study of user behavior in Foursquare, conducted on a dataset of about 700 thousand users that spans a period of more than 100 days. We analyze user checkin dynamics, demonstrating how it reveals meaningful spatio-temporal patterns and offers the opportunity to study both user mobility and urban spaces. Our aim is to inform on how scientific researchers could utilise data generated in Location-based Social Networks to attain a deeper understanding of human mobility and how developers may take advantage of such systems to enhance applications such as recommender systems.


Towards Discovery of Influence and Personality Traits through Social Link Prediction

AAAI Conferences

Estimation of a person's influence and personality traits from social media data has many applications. We use social linkage criteria, such as number of followers and friends, as proxies to form corpora, from popular blogging site Livejournal, for examining two two-class classification problems: influential vs. non-influential, and extraversion vs. introversion. Classification is performed using automatically-derived psycholinguistic and mood-based features of a user's textual messages. We experiment with three sub-corpora of 10000 users each, and present the most effective predictors for each category. The best classification result, at 80%, is achieved using psycholinguistic features; e.g., influentials are found to use more complex language, than non-influentials, and use more leisure-related terms.


Sentiment Flow Through Hyperlink Networks

AAAI Conferences

How does sentiment flow through hyperlink networks? Earlier work on hyperlink networks has focused on the structure of the network, often modeling posts as nodes in a directed graph in which edges represent hyperlinks. At the same time, sentiment analysis has largely focused on classifying texts in isolation. Here we analyze a large hyperlinked network of mass media and weblog posts to determine how sentiment features of a post affect the sentiment of connected posts and the structure of the network itself. We explore the phenomena of sentiment flow through experiments on a graph containing nearly 8 million nodes and 15 million edges. Our analysis indicates that (1) nodes are strongly influenced by their immediate neighbors, (2) deep cascades lead complex but predictable lives, (3) shallow cascades tend to be objective, and (4) sentiment becomes more polarized as depth increases.


Twitter Sentiment Analysis: The Good the Bad and the OMG!

AAAI Conferences

In this paper, we investigate the utility of linguistic features for detecting the sentiment of Twitter messages. We evaluate the usefulness of existing lexical resources as well as features that capture information about the informal and creative language used in microblogging. We take a supervied approach to the problem, but leverage existing hashtags in the Twitter data for building training data.


Structure and Reciprocity in Technology-Centered Q&A Communities

AAAI Conferences

In this paper we examine the network structure of the MythTV mailing list, an online technology Q&A user community, and we use time-series analysis techniques to study users’ reciprocity behavior in this community. We find that the amount of help users provide is strongly correlated to the amount of help they receive. Further, by conducting the Granger Causality test on the time series data of active users’ activity, we find that the amount of help given is actually the reason why one gets a lot of help. This finding corresponds to the concept of directed reciprocity in social networks and provides insights into social dynamics in technology-centered online communities.


Identifying Users Across Social Tagging Systems

AAAI Conferences

How much do tagging activities tell about a user? Is it possible to identify people in Delicious based on the tags, which they use in Flickr? In this paper we study those questions and investigate whether users can be identified across social tagging systems. We combine two kinds of information: their user ids and their tags. We introduce and compare a variety of approaches to measure the distance between user profiles for identification. With the best performing combination we achieve, depending on the actual settings, accuracies of between 60% and 80% which demonstrates that the traces of Web 2.0 users can reveal quite much about their identity.