Plotting

 Country


Why do Users Tag? Detecting Users’ Motivation for Tagging in Social Tagging Systems

AAAI Conferences

While recent progress has been achieved in understanding the structure and dynamics of social tagging systems, we know little about the underlying user motivations for tagging, and how they influence resulting folksonomies and tags. This paper addresses three issues related to this question: 1.) What motivates users to tag resources, and in what ways is user motivation amenable to quantitative analysis? 2.) Does users' motivation for tagging vary within and across social tagging systems, and if so how? and 3.) How does variability in user motivation influence resulting tags and folksonomies? In this paper, we present measures to detect whether a tagger is primarily motivated by categorizing or describing resources, and apply the measures to datasets from 8 different tagging systems. Our results show that a) users' motivation for tagging varies not only across, but also within tagging systems, and that b) tag agreement among users who are motivated by categorizing resources is significantly lower than among users who are motivated by describing resources. Our findings are relevant for (i) the development of tag recommenders, (ii) the analysis of tag semantics and (iii) the design of search algorithms for social tagging systems.


Classifier Calibration for Multi-Domain Sentiment Classification

AAAI Conferences

Textual sentiment classifiers classify texts into a fixed number of affective classes, such as positive, negative or neutral sentiment, or subjective versus objective information. It has been observed that sentiment classifiers suffer from a lack of generalization capability: a classifier trained on a certain domain generally performs worse on data from another domain. This phenomenon has been attributed to domain-specific affective vocabulary. In this paper, we propose a voting-based thresholding approach, which calibrates a number of existing single-domain classifiers with respect to sentiment data from a new domain. The approach presupposes only a small amount of annotated data from the new domain. We evaluate three criteria for estimating thresholds, and discuss the ramifications of these criteria for the trade-off between classifier performance and manual annotation effort.


Effective Question Recommendation Based on Multiple Features for Question Answering Communities

AAAI Conferences

We propose a new method of recommending questions to answerers so as to suit the answerers’ knowledge and interests in User-Interactive Question Answering (QA) communities. A question recommender can help answerers select the questions that interest them. This increases the number of answers, which will activate QA communities. An effective question recommender should satisfy the following three requirements: First, its accuracy should be higher than the existing category-based approach; more than 50% of answerers select the questions to answer according a fixed system of categories. Second, it should be able to recommend unanswered questions because more than 2,000 questions are posted every day. Third, it should be able to support even those people who have never answered a question previously, because more than 50% of users in current QA communities have never given any answer. To achieve an effective question recommender, we use question histories as well as the answer histories of each user by combining collaborative filtering schemes and content-base filtering schemes. Experiments on real log data sets of a famous Japanese QA community, Oshiete goo, show that our recommender satisfies the three requirements.


Widespread Worry and the Stock Market

AAAI Conferences

Our emotional state influences our choices. Research on how it happens usually comes from the lab. We know relatively little about how real world emotions affect real world settings, like financial markets. Here, we demonstrate that estimating emotions from weblogs provides novel information about future stock market prices. That is, it provides information not already apparent from market data. Specifically, we estimate anxiety, worry and fear from a dataset of over 20 million posts made on the site LiveJournal. Using a Granger-causal framework, we find that increases in expressions of anxiety, evidenced by computationally-identified linguistic features, predict downward pressure on the S&P 500 index. We also present a confirmation of this result via Monte Carlo simulation. The findings show how the mood of millions in a large online community, even one that primarily discusses daily life, can anticipate changes in a seemingly unrelated system. Beyond this, the results suggest new ways to gauge public opinion and predict its impact.


A Comparison of Information Seeking Using Search Engines and Social Networks

AAAI Conferences

The Web has become an important information repository; often it is the first source a person turns to with an informa-tion need. One common way to search the Web is with a search engine. However, it is not always easy for people to find what they are looking for with keyword search, and at times the desired information may not be readily available online. An alternative, facilitated by the rise of social media, is to pose a question to one‟s online social network. In this paper, we explore the pros and cons of using a social net-working tool to fill an information need, as compared with a search engine. We describe a study in which 12 participants searched the Web while simultaneously posing a question on the same topic to their social network, and we compare the results they found by each method.


Study of Static Classification of Social Spam Profiles in MySpace

AAAI Conferences

Reaching hundreds of millions of users, major social networks have become important target media for spammers. Although practical techniques such as collaborative filters and behavioral analysis are able to reduce spam, they have an inherent lag (to collect sufficient data on the spammer) that also limits their effectiveness. Through an experimental study of over 1.9 million MySpace profiles, we make a case for analysis of static user profile content, possibly as soon as such profiles are created. We compare several machine learning algorithms in their ability to distinguish spam profiles from legitimate profiles. We found that a C4.5 decision tree algorithm achieves the highest accuracy (99.4%) of finding rogue profiles, while naïve Bayes achieves a lower accuracy (92.6%). We also conducted a sensitivity analysis of the algorithms w.r.t. features which may be easily removed by spammers.


Socio-Legal Analysis of Criminal Sentences: A Preliminary Study

AAAI Conferences

This paper discusses a research based on analyzing criminal sentences on criminal trials on organized crime activity in Sicily pronounced from 2000 through 2006. Large criminal sentences related dataset collection activity in Italy is severely constrained for various reasons such as difficulty of data collection at the courthouses, unavailability of data in digital format, and classification criteria used in the public archives. Thus, in general, judicial statistics suffer from lack of reliability and informativeness. The objective of this research is to analyze the text of criminal sentences in a revisable and verifiable way, so that information is extracted on the trial leading to the sentence, the socio-economic environment in which the relevant events occurred, and the differences between the various districts conducting the trials. The purpose is to elaborate a tool of automated analysis of the text of the sentences that is generalizable to other areas of jurisprudence, and, outside of jurisprudence, to other temporal and geographical contexts. The 726 criminal sentences that have been converted into text files have been pronounced at all judicial levels in the four Sicilian districts for mafia-related crimes. This research is relevant because, for the first time in Italy, we aim to empirically describe the juridical response to the phenomenon of organized crime, by using a large and extendable database of criminal sentences that can be analyzed with data mining techniques, rather than deriving general conclusions from a focused small set of sentences.


Empirical Analysis of User Participation in Online Communities: the Case of Wikipedia

AAAI Conferences

We study the distribution of the activity period of users in five of the largest localized versions of the free, on- line encyclopedia Wikipedia. We find it to be consis- tent with a mixture of two truncated log-normal distri- butions. Using this model, the temporal evolution of these systems can be analyzed, showing that the statis- tical description is consistent over time.


User Interest and Interaction Structure in Online Forums

AAAI Conferences

We present a new similarity measure tailored to posts in an online forum. Our measure takes into account all the available information about user interest and interaction — the content of posts, the threads in the forum, and the author of the posts. We use this post similarity to build a similarity between users, based on principal coordinate analysis. This allows easy visualization of the user activity as well. Similarity between users has numerous applications, such as clustering or classification. We show that including the author of a post in the post similarity has a smoothing effect on principal coordinate projections. We demonstrate our method on real data drawn from an internal corporate forum, and compare our results to those given by a standard document classification method. We conclude our method gives a more detailed picture of both the local and global network structure.


Toward Social Causality: An Analysis of Interpersonal Relationships in Online Blogs and Forums

AAAI Conferences

In this paper we present encouraging preliminary results into the problem of social causality (causal reasoning used by intelligent agents in a social environment) in online social interactions based on a model of reciprocity. At every level, social relationships are guided by the shared understanding that most actions call for appropriate reactions, and that inappropriate reactions require management. Thus, we present an analysis of interpersonal relationships in English reciprocal contexts. Specifically, we rely here on a large and recently built database of 10,882 reciprocal relation instances in online media. The resource is analyzed along a set of novel and important dimensions: symmetry, affective value, gender}, and {\em intentionality of action which are highly interconnected. At a larger level, we automatically generate {\em chains of causal relations} between verbs indicating interpersonal relationships. Statistics along these dimensions give insights into people's behavior, judgments, and thus their social interactions.