Goto

Collaborating Authors

 Media


Detecting and Tracking Political Abuse in Social Media

AAAI Conferences

We study astroturf political campaigns on microblogging platforms: politically-motivated individuals and organizations that use multiple centrally-controlled accounts to create the appearance of widespread support for a candidate or opinion. We describe a machine learning framework that combines topological, content-based and crowdsourced features of information diffusion networks on Twitter to detect the early stages of viral spreading of political misinformation.  We present promising preliminary results with better than 96% accuracy in the detection of astroturf content in the run-up to the 2010 U.S. midterm elections.


The Effect of Mobile Platforms on Twitter Content Generation

AAAI Conferences

The increased popularity of feature-rich mobile devices in recent years has enabled widespread consumption and production of social media content via mobile devices. Because mobile devices and mobile applications change context within which an individual generates and consumes microblog content, we might expect microblogging behavior to differ depending on whether the user is using a mobile device. To our knowledge, little has been established about what, if any, effects such mobile interfaces have on microblogging. In this paper, we investigate this question within the context of Twitter, among the most popular microblogging platforms. This work makes three specific contributions. First, we quantify the ways in which user profiles are effected by the mobile context: (1) the extent to which users tend to be either fully non-mobile or mobile and (2) the relative activity of the mobile Twitter community. Second, we assess the differences in content between mobile and non-mobile tweets (posts to the Twitter platform). Our results show that mobile platforms produce very different patterns of Twitter usage. As part of our analysis, we propose and apply a classification system for tweets. We consider this to be the third contribution of this work. While other classification systems have been proposed, ours is the first to permit the independent encoding of a tweet’s form, content, and intended audience. In this paper we apply this system to show how tweets differ between mobile and non-mobile contexts. However, because of its flexibility and breadth, the schema may be useful to researchers studying Twitter content in other contexts as well.


Generate Adjective Sentiment Dictionary for Social Media Sentiment Analysis Using Constrained Nonnegative Matrix Factorization

AAAI Conferences

Although sentiment analysis has attracted a lot of research, little work has been done on social media data compared to product and movie reviews. This is due to the low accuracy that results from the more informal writing seen in social media data. Currently, most of sentiment analysis tools on social media choose the lexicon-based approach instead of the machine learning approach because the latter requires the huge challenge of obtaining enough human-labeled training data for extremely large-scale and diverse social opinion data. The lexicon-based approach requires a sentiment dictionary to determine opinion polarity. This dictionary can also provide useful features for any supervised learning method of the machine learning approach. However, many benchmark sentiment dictionaries do not cover the many informal and spoken words used in social media. In addition, they are not able to update frequently to include newly generated words online. In this paper, we present an automatic sentiment dictionary generation method, called Constrained Symmetric Nonnegative Matrix Factorization (CSNMF) algorithm, to assign polarity scores to each word in the dictionary, on a large social media corpus — digg.com. Moreover, we will demonstrate our study of Amazon Mechanical Turk (AMT) on social media word polarity, using both the human-labeled dictionaries from AMT and the General Inquirer Lexicon to compare our generated dictionary with. In our experiment, we show that combining links from both WordNet and the corpus to generate sentiment dictionaries does outperform using only one of them, and the words with higher sentiment scores yield better precision. Finally, we conducted a lexicon-based sentiment analysis on human-labeled social comments using our generated sentiment dictionary to show the effectiveness of our method.


The Prevalence of Political Discourse in Non-Political Blogs

AAAI Conferences

Though political theorists have emphasized the importance of political discussion in non-political spaces, past study of online political discussion has focused on primarily political websites. Using a random sample from Blogger.com, we find that 25% of all political posts are from blogs that post about politics less than 20% of the time, because the vast majority of blogs post about politics some of the time but infrequently. Far from being taboo topics in those non- political blogs, political posts got slightly more comments than non-political posts in those same blogs, and the comments overwhelmingly engage the political topics of the post, mostly agreeing but frequently disagreeing as well. We argue that non-political spaces devoted primarily to personal diaries, hobbies, and other topics represent a substantial place of online political discussion and should be a site for further study.


Extracting Meta Statements from the Blogosphere

AAAI Conferences

Information extraction systems have been recently proposed for organizing and exploring content in large online text corpora as information networks . In such networks, the nodes are named entities (e.g., people, organizations) while the edges correspond to statements indicating relations among such entities. To date, such systems extract rather primitive networks, capturing only those relations which are expressed by direct statements. In many applications, it is useful to also extract more subtle relations which are often expressed as meta statements in the text. These can, for instance provide the context for a statement (e.g., “Google acquired YouTube on October 2006”), or repercussion about a statement (e.g., “The US condemned Russia’s invasion of Georgia”). In this work, we report on a system for extracting relations expressed in both direct statements as well as in meta statements. We propose a method based on Conditional Random Fields that explores syntactic features to extract both kinds of statements seamlessly. We follow the Open Information Extraction paradigm, where a classifier is trained to recognize any type of relation instead of specific ones. Finally, our results show substantial improvements over a state-of-the-art information extraction system, both in terms of accuracy and, especially, recall.


Task Specialization in Social Production Communities: The Case of Geographic Volunteer Work

AAAI Conferences

In social production communities, users' individual and collective efforts lead to the creation of valuable resources — cf. Wikipedia, Open Street Map, and Reddit. Contributors to such communities often specialize in the tasks they choose to do. We found evidence for specialization by work type in Cyclopath, a geographic wiki for bicyclists -- most users edit a single type of map feature, such as points of interest or roads and trails. We also saw a user lifecycle effect: as users gain experience, they specialize in editing roads and trails. Our findings suggest more effective ways to organize social production interfaces, compose units of work, and match them to users who want to help.


On Prediction Using Variable Order Markov Models

arXiv.org Artificial Intelligence

This paper is concerned with algorithms for prediction of discrete sequences over a finite alphabet, using variable order Markov models. The class of such algorithms is large and in principle includes any lossless compression algorithm. We focus on six prominent prediction algorithms, including Context Tree Weighting (CTW), Prediction by Partial Match (PPM) and Probabilistic Suffix Trees (PSTs). We discuss the properties of these algorithms and compare their performance using real life sequences from three domains: proteins, English text and music pieces. The comparison is made with respect to prediction quality as measured by the average log-loss. We also compare classification algorithms based on these predictors with respect to a number of large protein classification tasks. Our results indicate that a "decomposed" CTW (a variant of the CTW algorithm) and PPM outperform all other algorithms in sequence prediction tasks. Somewhat surprisingly, a different algorithm, which is a modification of the Lempel-Ziv compression algorithm, significantly outperforms all algorithms on the protein classification problems.


A Comprehensive Trainable Error Model for Sung Music Queries

arXiv.org Artificial Intelligence

We propose a model for errors in sung queries, a variant of the hidden Markov model (HMM). This is a solution to the problem of identifying the degree of similarity between a (typically error-laden) sung query and a potential target in a database of musical works, an important problem in the field of music information retrieval. Similarity metrics are a critical component of query-by-humming (QBH) applications which search audio and multimedia databases for strong matches to oral queries. Our model comprehensively expresses the types of error or variation between target and query: cumulative and non-cumulative local errors, transposition, tempo and tempo changes, insertions, deletions and modulation. The model is not only expressive, but automatically trainable, or able to learn and generalize from query examples. We present results of simulations, designed to assess the discriminatory potential of the model, and tests with real sung queries, to demonstrate relevance to real-world applications.


Monte Carlo Methods for Tempo Tracking and Rhythm Quantization

arXiv.org Artificial Intelligence

The on tin uous hidden v ariables denote the temp o. Ex-a t omputation of p osterior features su h as the MAP state is in tra table in this mo del lass, so w e in tro du e Mon te Carlo metho ds for in tegration and optimization. The metho ds an b e applied in b oth online and bat h s enarios su h as temp o tra king and trans ription and are th us p oten tially useful in a n um b er of m usi appli ations su h as adaptiv e automati a ompanimen t, s ore t yp esetting and m usi information retriev al. 1. Ho w ev er, when op erating on sampled audio data from p olyphoni a ousti al signals, extra tion of a s ore-lik e des ription is a v ery hallenging auditory s ene analysis task (V er o e, Gardner, & S heirer, 1998). In this pap er, w e fo us on a subproblem in m usi -ir, where w e assume that exa t timing information of notes is a v ailable, for example as a stream of MIDI 1 ev en ts from a digital k eyb oard. One example is automati s ore t yp esetting, 1. Musi al Instrumen ts Digital In terfa e. Ea h time a k ey is pressed, a MIDI k eyb oard generates a short message on taining pit h and k ey v elo it y . In on v en tional m usi notation, the onset time of ea h note is impli itly represen ted b y the um ulativ e sum of durations of previous notes. Durations are en o ded b y simple rational n um b ers (e.g., quarter note, eigh th note), onsequen tly all ev en ts in m usi are pla ed on a dis rete grid. This is due to the fa t that m usi ians in tro du e in ten tional (and unin ten tional) deviations from a me hani al pres ription. F or example timing of ev en ts an b e delib erately dela y ed or pushed. Moreo v er, the temp o an u tuate b y slo wing do wn or a elerating. In fa t, su h deviations are natural asp e ts of expressiv e p erforman e; in the absen e of these, m usi tends to sound rather dull and me hani al. On the other hand, if these deviations are not a oun ted for during trans ription, resulting s ores ha v e often v ery p o or qualit y . Robust and fast quan tization and temp o tra king is also an imp ortan t requiremen t for in tera tiv e p erforman e systems; appli ations that \listen" to a p erformer for generating an a ompanimen t or impro visation in real time (Raphael, 2001b; Thom, 2000). A t last, su h mo dels are also useful in m usi ology for systemati study and hara terization of expressiv e timing b y prin ipled analysis of existing p erforman e data. F rom a theoreti al p ersp e tiv e, sim ultaneous quan tization and temp o tra king is a \ hi k en-and-egg" problem: the quan tization dep ends up on the in tended temp o in terpre-tation and the temp o in terpretation dep ends up on the quan tization. Apparen tly, h uman listeners an resolv e this am biguit y (in most ases) without an y e ort.


Grounding the Lexical Semantics of Verbs in Visual Perception using Force Dynamics and Event Logic

arXiv.org Artificial Intelligence

This paper presents an implemented system for recognizing the occurrence of events described by simple spatial-motion verbs in short image sequences. The semantics of these verbs is specified with event-logic expressions that describe changes in the state of force-dynamic relations between the participants of the event. An efficient finite representation is introduced for the infinite sets of intervals that occur when describing liquid and semi-liquid events. Additionally, an efficient procedure using this representation is presented for inferring occurrences of compound events, described with event-logic expressions, from occurrences of primitive events. Using force dynamics and event logic to specify the lexical semantics of events allows the system to be more robust than prior systems based on motion profile.