In the literature, various approaches have been proposedto address the domain adaptation problem in sentiment classification (also called cross-domainsentiment classification). However, the adaptation performance normally much suffers when the data distributionsin the source and target domains differ significantly. In this paper, we suggest to perform activelearning for cross-domain sentiment classification by actively selecting a smallamount of labeled data in the target domain. Accordingly, we propose an novel activelearning approach for cross-domain sentiment classification. First, we traintwo individual classifiers, i.e., the source and target classifiers with thelabeled data from the source and target respectively. Then, the two classifiersare employed to select informative samples with the selection strategy of QueryBy Committee (QBC). Third, the two classifier is combined to make theclassification decision. Importantly, the two classifiers are trained by fullyexploiting the unlabeled data in the target domain with the label propagation(LP) algorithm. Empirical studies demonstrate the effectiveness of our active learning approach for cross-domainsentiment classification over some strong baselines.
Video watching had emerged as one of the most frequent media activities on the Internet. Yet, little is known about how users watch online video. Using two distinct YouTube datasets, a set of random YouTube videos crawled from the Web and a set of videos watched by participants tracked by a Chrome extension, we examine whether and how indicators of collective preferences and reactions are associated with view duration of videos. We show that video view duration is positively associated with the video's view count, the number of likes per view, and the negative sentiment in the comments. These metrics and reactions have a significant predictive power over the duration the video is watched by individuals. Our findings provide a more precise understandings of user engagement with video content in social media beyond view count.
The inherent nature of social media content poses serious challenges to practical applications of sentiment analysis. We present VADER, a simple rule-based model for general sentiment analysis, and compare its effectiveness to eleven typical state-of-practice benchmarks including LIWC, ANEW, the General Inquirer, SentiWordNet, and machine learning oriented techniques relying on Naive Bayes, Maximum Entropy, and Support Vector Machine (SVM) algorithms. Using a combination of qualitative and quantitative methods, we first construct and empirically validate a gold-standard list of lexical features (along with their associated sentiment intensity measures) which are specifically attuned to sentiment in microblog-like contexts. We then combine these lexical features with consideration for five general rules that embody grammatical and syntactical conventions for expressing and emphasizing sentiment intensity. Interestingly, using our parsimonious rule-based model to assess the sentiment of tweets, we find that VADER outperforms individual human raters (F1 Classification Accuracy = 0.96 and 0.84, respectively), and generalizes more favorably across contexts than any of our benchmarks.
In the summer of 2013, Brazil experienced a period of conflict triggered by a series of protests. While the popular press covered the events, little empirical work has investigated how first-hand reporting of the protests occurred and evolved over social media and how such exposure in turn impacted the demonstrations themselves. In this study we examine over 42 million tweets shared during the three months of conflict in order to uncover patterns in online and offline protest-related activity as well as to explore relationships between language-use in tweets and the emotions and underlying motivations of protesters. Our findings show that peaks in Twitter activity coincide with days in which heavy protesting took place, that the words in tweets reflect emotional characteristics of protest-related events, and less expectedly, that these emotions convey both positive as well as negative sentiment.
Social media based digital epidemiology has the potential to support faster response and deeper understanding of public health related threats. This study proposes a new framework to analyze unstructured health related textual data via Twitter users' post (tweets) to characterize the negative health sentiments and non-health related concerns in relations to the corpus of negative sentiments, regarding Diet Diabetes Exercise, and Obesity (DDEO). Through the collection of 6 million Tweets for one month, this study identified the prominent topics of users as it relates to the negative sentiments. Our proposed framework uses two text mining methods, sentiment analysis and topic modeling, to discover negative topics. The negative sentiments of Twitter users support the literature narratives and the many morbidity issues that are associated with DDEO and the linkage between obesity and diabetes. The framework offers a potential method to understand the publics' opinions and sentiments regarding DDEO. More importantly, this research provides new opportunities for computational social scientists, medical experts, and public health professionals to collectively address DDEO-related issues.