Information Technology
Automatic Versus Human Navigation in Information Networks
West, Robert (Stanford University) | Leskovec, Jure (Stanford University)
People regularly face tasks that can be understood as navigation in information networks, where the goal is to find a path between two given nodes. In many such situations, the navigator only gets local access to the node currently under inspection and its immediate neighbors. This lack of global information about the network notwithstanding, humans tend to be good at finding short paths, despite the fact that real-world networks are typically very large. One potential reason for this could be that humans possess vast amounts of background knowledge about the world, which they leverage to make good guesses about possible solutions. In this paper we ask the question: Are human-like high-level reasoning skills really necessary for finding short paths? To answer this question, we design a number of navigation agents without such skills, which use only simple numerical features. We evaluate the agents on the task of navigating Wikipedia, a domain for which we also possess large-scale human navigation data. We observe that the agents find shorter paths than humans on average and therefore conclude that, perhaps surprisingly, no sophisticated background knowledge or high-level reasoning is required for navigating the complex Wikipedia network.
Towards Analyzing Micro-Blogs for Detection and Classification of Real-Time Intentions
Banerjee, Nilanjan (IBM Research - India) | Chakraborty, Dipanjan (IBM Research - India) | Joshi, Anupam (IBM Research - India) | Mittal, Sumit (IBM Research - India, New Delhi) | Rai, Angshu (IBM Research - India) | Ravindran, Balaraman (Indian Institute of Technology, Madras)
Micro-blog forums, such as Twitter, constitute a powerful medium today that people use to express their thoughts and intentions on a daily, and in many cases, hourly, basis. Extracting ‘Real-Time Intention’ (RTI) of a user from such short text updates is a huge opportunity towards web personalization and social net- working around dynamic user context. In this paper, we explore the novel problem of detecting and classifying RTIs from micro-blogs. We find that employing a heuristic based ensemble approach on a reduced dimension of the feature space, based on a wide spectrum of linguistic and statistical features of RTI expressions, achieves significant improvement in detect- ing RTIs compared to word-level features used in many social media classification tasks today. Our solution approach takes into account various salient characteristics of micro-blogs towards such classification – high dimensionality, sparseness of data, limited context, grammatical in-correctness, etc.
Emotional Divergence Influences Information Spreading in Twitter
Pfitzner, Rene (ETH Zurich) | Garas, Antonios (ETH Zurich) | Schweitzer, Frank (ETH Zurich)
We analyze data about the micro-blogging site Twitter using sentiment extraction techniques. From an information perspective, Twitter users are involved mostly in two processes: information creation and subsequent distribution (tweeting), and pure information distribution (retweeting), with pronounced preference to the first. However a rather substantial fraction of tweets are retweeted. Here, we address the role of the sentiment expressed in tweets for their potential aftermath. We find that although the overall sentiment (polarity) does not influence the probability of a tweet to be retweeted, a new measure called "emotional divergence" does have an impact. In general, tweets with high emotional diversity have a better chance of being retweeted, hence influencing the distribution of information.
What's in Your Tweets? I Know Who You Supported in the UK 2010 General Election
Boutet, Antoine (INRIA Rennes Bretagne Atlantique) | Kim, Hyoungshick (University of Cambridge) | Yoneki, Eiko (University of Cambridge)
Nowadays, the use of social media such as Twitter is necessary to monitor trends of people on political issues. As a case study, we collected the main stream of Twitter related to the 2010 UK general election during the associated period. We analyse the characteristics of the three main parties in the election. Also, we propose a simple and practical algorithm to identify the political leaning of users using the amount of Twitter messages which seem related to political parties. The experimental results showed that the best-performing classification method -- which uses the number of Twitter messages referring to a particular political party -- achieved about 86% classification accuracy without any training phase.
The Length of Bridge Ties: Structural and Geographic Properties of Online Social Interactions
Volkovich, Yana (Barcelona Media Foundation) | Scellato, Salvatore (University of Cambridge) | Laniado, David (Barcelona Media Foundation) | Mascolo, Cecilia (University of Cambridge) | Kaltenbrunner, Andreas (Barcelona Media Foundation)
The popularity of the Web has allowed individuals to communicate and interact with each other on a global scale: people connect both to close friends and acquaintances, creating ties that can bridge otherwise separated groups of people. Recent evidence suggests that spatial distance is still affecting social links established on online platforms, with online ties preferentially connecting closer people. In this work we study the relationships between interaction strength, spatial distance and structural position of ties between members of a large-scale online social networking platform, Tuenti. We discover that ties in highly connected social groups tend to span shorter distances than connections bridging together otherwise separated portions of the network. We also find that such bridging connections have lower social interaction levels than ties within the inner core of the network and ties connecting to its periphery. Our results suggest that spatial constraints on online social networks are intimately connected to structural network properties, with important consequences for information diffusion.
Who Does What on the Web: A Large-Scale Study of Browsing Behavior
Goel, Sharad (Yahoo! Research) | Hofman, Jake M. (Yahoo! Research) | Sirer, M. Irmak (Northwestern University)
As the Web has become integrated into daily life, understanding how individuals spend their time online impacts domains ranging from public policy to marketing. It is difficult, however, to measure even simple aspects of browsing behavior via conventional methods---including surveys and site-level analytics---due to limitations of scale and scope. In part addressing these limitations, large-scale Web panel data are a relatively novel means for investigating patterns of Internet usage. In one of the largest studies of browsing behavior to date, we pair Web histories for 250,000 anonymized individuals with user-level demographics---including age, sex, race, education, and income---to investigate three topics. First, we examine how behavior changes as individuals spend more time online, showing that the heaviest users devote nearly twice as much of their time to social media relative to typical individuals. Second, we revisit the digital divide, finding that the frequency with which individuals turn to the Web for research, news, and healthcare is strongly related to educational background, but not as closely tied to gender and ethnicity. Finally, we demonstrate that browsing histories are a strong signal for inferring user attributes, including ethnicity and household income, a result that may be leveraged to improve ad targeting.
Cultural Analytics of Large Datasets from Flickr
Ushizima, Daniela (Lawrence Berkeley National Laboratory) | Manovich, Lev (University of California, San Diego) | Margolis, Todd (University of California, San Diego) | Douglas, Jeremy (Ashford University)
Deluge became a metaphor to describe the amount of information to which we are subjected, and very often we feel we are drowning while our access to information is rising. Devising mechanisms for exploring massive image sets according to perceptual attributes is still a challenge, even more when dealing with user-generated social media content. Such images tend to be heterogenous, and using metadata-only can be misleading. This paper describes a set of tools designed to analyze large sets of user-created art related images using image features describing color, texture, composition and orientation. The proposed pipeline permits to discriminate Flickr groups in terms of feature vectors and clustering parameters. The algorithms are general enough to be applied to other domains in which the main question is about the variability of the images.
Happy, Nervous or Surprised? Classification of Human Affective States in Social Media
Choudhury, Munmun De (Microsoft Research, Redmond) | Gamon, Michael (Microsoft Research, Redmond) | Counts, Scott (Microsoft Research, Redmond)
Sentiment classification has been a well-investigated research area in the computational linguistics community. However, most of the research is primarily focused on detecting simply the polarity in text, often needing extensive manual labeling of ground truth. Additionally, little attention has been directed towards a finer analysis of human moods and affective states. Motivated by research in psychology, we propose and develop a classifier of several human affective states in social media. Starting with about 200 moods, we utilize mechanical turk studies to derive naturalistic signals from posts shared on Twitter about a variety of affects of individuals. This dataset is then deployed in an affect classification task with promising results. Our findings indicate that different types of affect involve different emotional content and usage styles; hence the performance of the classifier on various affects can differ considerably.
Enhancing Event Descriptions through Twitter Mining
Tanev, Hristo (Joint Research Centre, European Commission) | Ehrmann, Maud (Joint Research Centre, European Commission) | Piskorski, Jakub (Frontex) | Zavarella, Vanni (Joint Research Centre, European Commission)
We describe a simple IR approach for linking news about events, detected by an event extraction system, to messages from Twitter (tweets). In particular, we explore several methods for creating event-specific queries for Twitter and provide a quantitative and qualitative evaluation of the relevance and usefulness of the information obtained from the tweets. We showed that methods based on utilization of word co-occurrence clustering, domain-specific keywords and named entity recognition improve the performance with respect to a basic approach.
Modeling Diffusion in Social Networks Using Network Properties
Luu, Duc Minh (Singapore Management University) | Lim, Ee-Peng (Singapore Management University) | Hoang, Tuan-Anh (Singapore Management University) | Chua, Freddy Chong Tat (Singapore Management University)
Diffusion of items occurs in social networks due to spreading of items through word of mouth and exogenous factors. These items may be news, products, videos, advertisements or contagious viruses. Previous research has studied diffusion process at both the macro and micro levels. The former models the number of item adopters in the diffusion process while the latter determines which individuals adopt item. In this paper, we establish a general probabilistic framework, which can be used to derive macro-level diffusion models, including the well known Bass Model (BM). Using this framework, we develop several other models considering the social network’s degree distribution coupled with the assumption of linear influence by neighboring adopters in the diffusion process. Through some evaluation on synthetic data, this paper shows that degree distribution actually changes during the diffusion process. We therefore introduce a multi-stage diffusion model to cope with variable degree distribution. By conducting experiments on both synthetic and real datasets, we show that our proposed diffusion models can recover the diffusion parameters from the observed diffusion data, which allows us to model diffusion with high accuracy.