Ruiz, Camille (Nara Institute of Science and Technology) | Ito, Kaoru (Nara Institute of Science and Technology) | Wakamiya, Shoko (Nara Institute of Science and Technology) | Aramaki, Eiji (Nara Institute of Science and Technology)
Although loneliness is a very familiar emotion, little is known about it. An aspect to explore is the prevalence of loneliness in the connected world that social media sites like Twitter provide. In light of this, this study investigates the Twitter data of users that have expressed loneliness to understand the phenomenon. Since our primary material are tweets, we developed various indices that can measure social activities reflected in online relationships and real life relationship solely through online Twitter data. Through these indices, the relations between social activity and loneliness were investigated. The results show that high lonely users seem to have low online activity, high positive expressions on real life relationships, and narrow ingroups.
In the summer of 2013, Brazil experienced a period of conflict triggered by a series of protests. While the popular press covered the events, little empirical work has investigated how first-hand reporting of the protests occurred and evolved over social media and how such exposure in turn impacted the demonstrations themselves. In this study we examine over 42 million tweets shared during the three months of conflict in order to uncover patterns in online and offline protest-related activity as well as to explore relationships between language-use in tweets and the emotions and underlying motivations of protesters. Our findings show that peaks in Twitter activity coincide with days in which heavy protesting took place, that the words in tweets reflect emotional characteristics of protest-related events, and less expectedly, that these emotions convey both positive as well as negative sentiment.
We seek to determine the effectiveness of using location-based social media to predict the outcome of the 2016 presidential election. To this aim, we create a dataset consisting of approximately 3 million tweets ranging from September 22nd to November 8th related to either Donald Trump or Hillary Clinton. Twenty-one states are chosen, with eleven categorized as swing states, five as Clinton favored and five as Trump favored. We incorporate two metrics in polling voter opinion for election outcomes: tweet volume and positive sentiment. Our data is labeled via a convolutional neural network trained on the sentiment140 dataset. To determine whether Twitter is an indicator of election outcome, we compare our results to the election outcome per state and across the nation. We use two approaches for determining state victories: winner-take-all and shared elector count. Our results show tweet sentiment mirrors the close races in the swing states; however, the differences in distribution of positive sentiment and volume between Clinton and Trump are not significant using our approach. Thus, we conclude neither sentiment nor volume is an accurate predictor of election results using our collection of data and labeling process.
Incorporating semantic features from the WordNet lexical database is among one of the many approaches that have been tried to improve the predictive performance of text classification models. The intuition behind this is that keywords in the training set alone may not be extensive enough to enable generation of a universal model for a category, but if we incorporate the word relationships in WordNet, a more accurate model may be possible. Other researchers have previously evaluated the effectiveness of incorporating WordNet synonyms, hypernyms, and hyponyms into text classification models. Generally, they have found that improvements in accuracy using features derived from these relationships are dependent upon the nature of the text corpora from which the document collections are extracted. In this paper, we not only reconsider the role of WordNet synonyms, hypernyms, and hyponyms in text classification models, we also consider the role of WordNet meronyms and holonyms. Incorporating these WordNet relationships into a Coordinate Matching classifier, a Naive Bayes classifier, and a Support Vector Machine classifier, we evaluate our approach on six document collections extracted from the Reuters-21578, USENET, and Digi-Trad text corpora. Experimental results show that none of the WordNet relationships were effective at increasing the accuracy of the Naive Bayes classifier. Synonyms, hypernyms, and holonyms were effective at increasing the accuracy of the Coordinate Matching classifier, and hypernyms were effective at increasing the accuracy of the SVM classifier.
In the literature, various approaches have been proposedto address the domain adaptation problem in sentiment classification (also called cross-domainsentiment classification). However, the adaptation performance normally much suffers when the data distributionsin the source and target domains differ significantly. In this paper, we suggest to perform activelearning for cross-domain sentiment classification by actively selecting a smallamount of labeled data in the target domain. Accordingly, we propose an novel activelearning approach for cross-domain sentiment classification. First, we traintwo individual classifiers, i.e., the source and target classifiers with thelabeled data from the source and target respectively. Then, the two classifiersare employed to select informative samples with the selection strategy of QueryBy Committee (QBC). Third, the two classifier is combined to make theclassification decision. Importantly, the two classifiers are trained by fullyexploiting the unlabeled data in the target domain with the label propagation(LP) algorithm. Empirical studies demonstrate the effectiveness of our active learning approach for cross-domainsentiment classification over some strong baselines.