Country
Towards Discovery of Influence and Personality Traits through Social Link Prediction
Nguyen, Thin (Curtin University of Technology) | Phung, Dinh (Curtin University of Technology) | Adams, Brett (Curtin University of Technology) | Venkatesh, Svetha (Curtin University of Technology)
Estimation of a person's influence and personality traits from social media data has many applications. We use social linkage criteria, such as number of followers and friends, as proxies to form corpora, from popular blogging site Livejournal, for examining two two-class classification problems: influential vs. non-influential, and extraversion vs. introversion. Classification is performed using automatically-derived psycholinguistic and mood-based features of a user's textual messages. We experiment with three sub-corpora of 10000 users each, and present the most effective predictors for each category. The best classification result, at 80%, is achieved using psycholinguistic features; e.g., influentials are found to use more complex language, than non-influentials, and use more leisure-related terms.
Information Propagation on the Web: Data Extraction, Modeling and Simulation
Nel, Franรงois (LIP6 - UPMC) | Lesot, Marie-Jeanne (LIP6 - UPMC) | Delavallade, Thomas (Thales Land and Joint Systems) | Capet, Philippe (Thales Land and Joint Systems)
This paper proposes a model of information propagation mechanisms on the Web, describing all steps of its design and use in simulation. First the characteristics of a real network are studied, in particular in terms of citation policies: from a network extracted from the Web by a crawling tool, distinct publishing behaviours are identified and characterised. The Zero Crossing model for information diffusion is then extended to increase its expressive power and allow it to reproduce this variety of behaviours. Experimental results based on a simulation validate the proposed extension.
Sentiment Flow Through Hyperlink Networks
Miller, Mahalia (Stanford University) | Sathi, Conal (Stanford University) | Wiesenthal, Daniel (Stanford University) | Leskovec, Jure (Stanford University) | Potts, Christopher (Stanford University)
How does sentiment flow through hyperlink networks? Earlier work on hyperlink networks has focused on the structure of the network, often modeling posts as nodes in a directed graph in which edges represent hyperlinks. At the same time, sentiment analysis has largely focused on classifying texts in isolation. Here we analyze a large hyperlinked network of mass media and weblog posts to determine how sentiment features of a post affect the sentiment of connected posts and the structure of the network itself. We explore the phenomena of sentiment flow through experiments on a graph containing nearly 8 million nodes and 15 million edges. Our analysis indicates that (1) nodes are strongly influenced by their immediate neighbors, (2) deep cascades lead complex but predictable lives, (3) shallow cascades tend to be objective, and (4) sentiment becomes more polarized as depth increases.
Exploring Feature Definition and Selection for Sentiment Classifiers
Mejova, Yelena (University of Iowa) | Srinivasan, Padmini (University of Iowa)
In this paper, we systematically explore feature definition and selection strategies for sentiment polarity classification. We begin by exploring basic questions, such as whether to use stemming, term frequency versus binary weighting, negation-enriched features, n-grams or phrases. We then move onto more complex aspects including feature selection using frequency-based vocabulary trimming, part-of-speech and lexicon selection (three types of lexicons), as well as using expected Mutual Information (MI). Using three product and movie review datasets of various sizes, we show, for example, that some techniques are more beneficial for larger datasets than the smaller. A classifier trained on only few features ranked high by MI outperformed one trained on all features in large datasets, yet in small dataset this did not prove to be true. Finally, we perform a space and computation cost analysis to further understand the merits of various feature types.
Twitter Sentiment Analysis: The Good the Bad and the OMG!
Kouloumpis, Efthymios (i-sieve Technologies) | Wilson, Theresa (Johns Hopkins University) | Moore, Johanna (University of Edinburgh)
In this paper, we investigate the utility of linguistic features for detecting the sentiment of Twitter messages. We evaluate the usefulness of existing lexical resources as well as features that capture information about the informal and creative language used in microblogging. We take a supervied approach to the problem, but leverage existing hashtags in the Twitter data for building training data.
Supervised Topic Segmentation of Email Conversations
Joty, Shafiq (University of British Columbia) | Carenini, Giuseppe (University of British Columbia) | Murray, Gabriel (University of British Columbia) | Ng, Raymond T (University of British Columbia)
We propose a graph-theoretic supervised topic segmentation model for email conversations which combines (i) lexical knowledge, (ii) conversational features, and (iii) topic features. We compare our results with the existing unsupervised models (i.e., LCSeg and LDA), and with their two extensions for email conversations (i.e., LCSeg+FQG and LDA+FQG) that not only use lexical information but also exploit finer conversation structure. Empirical evaluation shows that our supervised model is the best performer and achieves highest accuracy by combining the three different knowledge sources, where knowledge about the conversation has proved to be the most important indicator for segmenting emails.
Structure and Reciprocity in Technology-Centered Q&A Communities
Jiang, Ming (University of Michigan) | Dong, Tao (University of Michigan) | Chang, Yung-Ju (University of Michigan)
In this paper we examine the network structure of the MythTV mailing list, an online technology Q&A user community, and we use time-series analysis techniques to study usersโ reciprocity behavior in this community. We find that the amount of help users provide is strongly correlated to the amount of help they receive. Further, by conducting the Granger Causality test on the time series data of active usersโ activity, we find that the amount of help given is actually the reason why one gets a lot of help. This finding corresponds to the concept of directed reciprocity in social networks and provides insights into social dynamics in technology-centered online communities.
Identifying Users Across Social Tagging Systems
Iofciu, Tereza (Leibniz University Hannover) | Fankhauser, Peter (Leibniz University Hannover) | Abel, Fabian (TU Delft) | Bischoff, Kerstin (Leibniz University Hannover)
How much do tagging activities tell about a user? Is it possible to identify people in Delicious based on the tags, which they use in Flickr? In this paper we study those questions and investigate whether users can be identified across social tagging systems. We combine two kinds of information: their user ids and their tags. We introduce and compare a variety of approaches to measure the distance between user profiles for identification. With the best performing combination we achieve, depending on the actual settings, accuracies of between 60% and 80% which demonstrates that the traces of Web 2.0 users can reveal quite much about their identity.
Relevance Modeling for Microblog Summarization
Harabagiu, Sanda (University of Texas at Dallas) | Hickl, Andrew (Language Computer Corporation)
This paper introduces a new type of summarization task, known as microblog summarization, which aims to synthesize content from multiple microblog posts on the same topic into a human-readable prose description of fixed length. Our approach leverages (1) a generative model which induces event structures from text and (2) a user behavior model which captures how users convey relevant content.
Exploring Text Virality in Social Networks
Guerini, Marco (Fondazione Bruno Kessler - IRST) | Strapparava, Carlo (Fondazione Bruno Kessler - IRST) | Ozbal, Gozde (Fondazione Bruno Kessler - IRST)
This paper aims to shed some light on the concept of virality - especially in social networks - and to provide new insights on its structure. We argue that: (a) virality is a phenomenon strictly connected to the nature of the content being spread, rather than to the influencers who spread it (b) virality is a phenomenon with many facets, i.e. under this generic term several different effects of persuasive communication are comprised and they only partially overlap. To give ground to our claims, we provide initial experiments in a machine learning framework to show how various aspects of virality can be independently predicted according to content features.