Industry
The Livehoods Project: Utilizing Social Media to Understand the Dynamics of a City
Cranshaw, Justin (Carnegie Mellon University) | Schwartz, Raz (Carnegie Mellon University) | Hong, Jason (Carnegie Mellon University) | Sadeh, Norman (Carnegie Mellon University)
Studying the social dynamics of a city on a large scale has tra- ditionally been a challenging endeavor, requiring long hours of observation and interviews, usually resulting in only a par- tial depiction of reality. At the same time, the boundaries of municipal organizational units, such as neighborhoods and districts, are largely statically defined by the city government and do not always reflect the character of life in these ar- eas. To address both difficulties, we introduce a clustering model and research methodology for studying the structure and composition of a city based on the social media its res- idents generate. We use data from approximately 18 million check-ins collected from users of a location-based online so- cial network. The resulting clusters, which we call Livehoods, are representations of the dynamic urban areas that comprise the city. We take an interdisciplinary approach to validating these clusters, interviewing 27 residents of Pittsburgh, PA, to see how their perceptions of the city project onto our findings there. Our results provide strong support for the discovered clusters, showing how Livehoods reveal the distinctly charac- terized areas of the city and the forces that shape them.
Extracting Diverse Sentiment Expressions with Target-Dependent Polarity from Twitter
Chen, Lu (Wright State University) | Wang, Wenbo (Wright State University) | Nagarajan, Meenakshi (IBM Almaden Research Center) | Wang, Shaojun (Wright State University) | Sheth, Amit P. (Wright State University)
The problem of automatic extraction of sentiment expressions from informal text, as in microblogs such as tweets is a recent area of investigation. Compared to formal text, such as in product reviews or news articles, one of the key challenges lies in the wide diversity and informal nature of sentiment expressions that cannot be trivially enumerated or captured using predefined lexical patterns. In this work, we present an optimization-based approach to automatically extract sentiment expressions for a given target (e.g., movie, or person) from a corpus of unlabeled tweets. Specifically, we make three contributions: (i) we recognize a diverse and richer set of sentiment-bearing expressions in tweets, including formal and slang words/phrases, not limited to pre-specified syntactic patterns; (ii) instead of associating sentiment with an entire tweet, we assess the target-dependent polarity of each sentiment expression. The polarity of sentiment expression is determined by the nature of its target; (iii) we provide a novel formulation of assigning polarity to a sentiment expression as a constrained optimization problem over the tweet corpus. Experiments conducted on two domains, tweets mentioning movie and person entities, show that our approach improves accuracy in comparison with several baseline methods, and that the improvement becomes more prominent with increasing corpus sizes.
Grief-Stricken in a Crowd: The Language of Bereavement and Distress in Social Media
Brubaker, Jed R. (University of California, Irvine) | Kivran-Swaine, Funda (Rutgers University) | Taber, Lee (University of California, Irvine) | Hayes, Gillian R. (University of California, Irvine)
People turn to social media to express their emotions surrounding major life events. Death of a loved one is one scenario in which people share their feelings in the semi-public space of social networking sites. In this paper, we present the results of a two-part investigation of grief and distress in the context of messages posted to the profiles of deceased MySpace users. We present coding system for identifying emotion distressed content, followed by a detailed analysis of language use that lays a foundation for natural language processing (NLP) tasks, such as automatic detection of bereavement-related distress. Our findings suggest that in addition to words bearing positive or negative sentiment, linguistic style can be an indicator of messages that demonstrate distress in the space of post-mortem social media content. These results contribute to research in computational linguistics by identifying linguistic features that can be used for automatic classification as well as to research on death and bereavement by enumerating attributes of distressed self-expression in a post-mortem context.
Modeling Polarizing Topics: When Do Different Political Communities Respond Differently to the Same News?
Balasubramanyan, Ramnath (Carnegie Mellon University) | Cohen, William W (Carnegie Mellon University) | Pierce, Douglas (Rutgers University) | Redlawsk, David P. (Rutgers University)
Political discourse in the United States is getting increasingly polarized. This polarization frequently causes different communities to react very differently to the same news events. Political blogs as a form of social media provide an unique insight into this phenomenon. We present a multitarget, semisupervised latent variable model, MCR-LDA to model this process by analyzing political blogs posts and their comment sections from different political communities jointly to predict the degree of polarization that news topics cause. Inspecting the model after inference reveals topics and the degree to which it triggers polarization. In this approach, community responses to news topics are observed using sentiment polarity and comment volume which serves as a proxy for the level of interest in the topic. In this context, we also present computational methods to assign sentiment polarity to the comments which serve as targets for latent variable models that predict the polarity based on the topics in the blog content. Our results show that the joint modeling of communities with different political beliefs using MCR-LDA does not sacrifice accuracy in sentiment polarity prediction when compared to approaches that are tailored to specific communities and additionally provides a view of the polarization in responses from the different communities.
People Are Strange When You're a Stranger: Impact and Influence of Bots on Social Networks
Aiello, Luca Maria (Universita') | Deplano, Martina (degli Studi di Torino) | Schifanella, Rossano (Universita') | Ruffo, Giancarlo (degli Studi di Torino)
Bots are, for many Web and social media users, the source of many dangerous attacks or the carrier of unwanted messages, such as spam. Nevertheless, crawlers and software agents are a precious tool for analysts, and they are continuously executed to collect data or to test distributed applications. However, no one knows which is the real potential of a bot whose purpose is to control a community, to manipulate consensus, or to influence user behavior. It is commonly believed that the better an agent simulates human behavior in a social network, the more it can succeed to generate an impact in that community. We contribute to shed light on this issue through an online social experiment aimed to study to what extent a bot with no trust, no profile, and no aims to reproduce human behavior, can become popular and influential in a social media. Results show that a basic social probing activity can be used to acquire social relevance on the network and that the so-acquired popularity can be effectively leveraged to drive users in their social connectivity choices. We also register that our bot activity unveiled hidden social polarization patterns in the community and triggered an emotional response of individuals that brings to light subtle privacy hazards perceived by the user base.
Tutorials
Breslin, John (National University of Ireland, Galway)
The ICWSM 2012 conference tutorials will be How to Analyze Massive Social Network Datasets without a Cluster, presented by Derek Ruths; Charting Collections of Connections in Social Media: Creating Maps and Measures with NodeXL, presented by Marc Smith; Evidenced-Based Social Design of Online Communities: Getting to Critical Mass and Encouraging Contributions, presented by Paul Resnick and Robert Kraut; Sentiment Mining from User Generated Content, presented by Lyle Ungar and Ronen Feldman; and Information Extraction for Social Media Anaylsis, presented by Denilson Barbosa.
Isabelle/PIDE as Platform for Educational Tools
Wenzel, Makarius, Wolff, Burkhart
The Isabelle/PIDE platform addresses the question whether proof assistants of the LCF family are suitable as technological basis for educational tools. The traditionally strong logical foundations of systems like HOL, Coq, or Isabelle have so far been counter-balanced by somewhat inaccessible interaction via the TTY (or minor variations like the well-known Proof General / Emacs interface). Thus the fundamental question of math education tools with fully-formal background theories has often been answered negatively due to accidental weaknesses of existing proof engines. The idea of "PIDE" (which means "Prover IDE") is to integrate existing provers like Isabelle into a larger environment, that facilitates access by end-users and other tools. We use Scala to expose the proof engine in ML to the JVM world, where many user-interfaces, editor frameworks, and educational tools already exist. This shall ultimately lead to combined mathematical assistants, where the logical engine is in the background, without obstructing the view on applications of formal methods, formalized mathematics, and math education in particular.
Towards an Intelligent Tutor for Mathematical Proofs
Autexier, Serge, Dietrich, Dominik, Schiller, Marvin
Computer-supported learning is an increasingly important form of study since it allows for independent learning and individualized instruction. In this paper, we discuss a novel approach to developing an intelligent tutoring system for teaching textbook-style mathematical proofs. We characterize the particularities of the domain and discuss common ITS design models. Our approach is motivated by phenomena found in a corpus of tutorial dialogs that were collected in a Wizard-of-Oz experiment. We show how an intelligent tutor for textbook-style mathematical proofs can be built on top of an adapted assertion-level proof assistant by reusing representations and proof search strategies originally developed for automated and interactive theorem proving. The resulting prototype was successfully evaluated on a corpus of tutorial dialogs and yields good results.
Towards an efficient prover for the C1 paraconsistent logic
Neto, Adolfo, Kaestner, Celso A. A., Finger, Marcelo
The KE inference system is a tableau method developed by Marco Mondadori which was presented as an improvement, in the computational efficiency sense, over Analytic Tableaux. In the literature, there is no description of a theorem prover based on the KE method for the C1 paraconsistent logic. Paraconsistent logics have several applications, such as in robot control and medicine. These applications could benefit from the existence of such a prover. We present a sound and complete KE system for C1, an informal specification of a strategy for the C1 prover as well as problem families that can be used to evaluate provers for C1. The C1 KE system and the strategy described in this paper will be used to implement a KE based prover for C1, which will be useful for those who study and apply paraconsistent logics.
Vector-valued Reproducing Kernel Banach Spaces with Applications to Multi-task Learning
The purpose of this paper is to establish the notion of vector-valued reproducing kernel Banach spaces and demonstrate its applications to multi-task machine learning. Built on the theory of scalar-valued reproducing kernel Hilbert spaces (RKHS) [3], kernel methods have been proven successful in single task machine learning [10, 14, 29, 30, 33]. Multi-task learning where the unknown target function to be learned from finite sample data is vector-valued appears more often in practice. References [13, 25] proposed the development of kernel methods for learning multiple related tasks simultaneously. The mathematical foundation used there was the theory of vector-valued RKHS [5, 27].