Genre
Effective Question Recommendation Based on Multiple Features for Question Answering Communities
Kabutoya, Yutaka (NTT Cyber Solutions Laboratories, NTT Corporation) | Iwata, Tomoharu (NTT Cyber Solutions Laboratories, NTT Corporation) | Shiohara, Hisako (NTT Cyber Solutions Laboratories, NTT Corporation) | Fujimura, Ko (NTT Cyber Solutions Laboratories, NTT Corporation)
We propose a new method of recommending questions to answerers so as to suit the answerers’ knowledge and interests in User-Interactive Question Answering (QA) communities. A question recommender can help answerers select the questions that interest them. This increases the number of answers, which will activate QA communities. An effective question recommender should satisfy the following three requirements: First, its accuracy should be higher than the existing category-based approach; more than 50% of answerers select the questions to answer according a fixed system of categories. Second, it should be able to recommend unanswered questions because more than 2,000 questions are posted every day. Third, it should be able to support even those people who have never answered a question previously, because more than 50% of users in current QA communities have never given any answer. To achieve an effective question recommender, we use question histories as well as the answer histories of each user by combining collaborative filtering schemes and content-base filtering schemes. Experiments on real log data sets of a famous Japanese QA community, Oshiete goo, show that our recommender satisfies the three requirements.
Temporal Correlation between Social Tags and Emerging Long-Term Trend Detection
Hsu, Ming-Hung (National Taiwan University) | Chang, Yu-Hui (National Taiwan University) | Chen, Hsin-Hsi (National Taiwan University)
Social annotation has become a popular manner for web users to manage and share their information and interests. While users' interests vary with time, tag correlation also changes from users' perspectives. In this work, we explore four methods for estimating temporal correlation between social tags and detect if a long-term trend emerges from the history of temporal correlation between two tags. Three types of trends are specified: steadily-shifting, stabilizing, and cyclic. To compare the results of the four estimation methods, an indirect evaluation is realized by applying detected trends to tag recommendation.
The Wisdom of Bookies? Sentiment Analysis Versus. the NFL Point Spread
Hong, Yancheng (Hong Kong University of Science &) | Skiena, Steven (Technology)
The American Football betting market provides a particularly attractive domain to study the nexus between public sentiment and the wisdom of crowds. In this paper, we present the first substantial study of the relationship between the NFL betting line and public opinion expressed in blogs and microblogs (Twitter). We perform a large-scale study of four distinct text streams: LiveJournal blogs, RSS blog feeds captured by Spinn3r, Twitter, and traditional news media. Our results show interesting disparities between the first and second halves of each season. We present evidence showing usefulness of sentiment on NFL betting. We demonstrate that a strategy betting roughly 30 games per year identified winner roughly 60% of the time from 2006 to 2009, well beyond what is needed to overcome the bookie's typical commission(53%).
Social Dynamics of Digg
Hogg, Tad (Independent Researcher) | Lerman, Kristina (USC Information Sciences Institute)
Online social media often highlight content that is highly rated by neighbors in a social network. For the news aggregator Digg, we use a stochastic model to distinguish the effect of the increased visibility from the network from how interesting content is to users. We find a wide range of interest, and distinguish stories primarily of interest to users in the network from those of more general interest to the user community. This distinction helps predict a story's eventual popularity from users' early reactions to the story.
The Perceived Credibility of Online Encyclopedias Among Children
Flanagin, Andrew J. (University of California, Santa Barbara) | Metzger, Miriam J. (University of California, Santa Barbara)
This study examined young people’s trust of Wikipedia as an information resource. A large-scale probability-based survey with embedded quasi-experiments was conducted with 2,747 children in the U.S. ranging from 11 to 18 years old. Results show that young people find Wikipedia to be fairly credible, but also exhibit an awareness of potential problems with non-expert, user-generated content in anonymous environments. Children tend to evaluate the credibility of online encyclopedia information with this in mind, at times with what appears to be an unwarranted devaluation of this information.
Empirical Analysis of User Participation in Online Communities: the Case of Wikipedia
Ciampaglia, Giovanni Luca (Università della Svizzera Italiana) | Vancheri, Alberto (Università della Svizzera Italiana)
We study the distribution of the activity period of users in five of the largest localized versions of the free, on- line encyclopedia Wikipedia. We find it to be consis- tent with a mixture of two truncated log-normal distri- butions. Using this model, the temporal evolution of these systems can be analyzed, showing that the statis- tical description is consistent over time.
“How Incredibly Awesome!” — Click Here to Read More
Ahn, Hyung-il (Massachusetts Institute of Technology) | Geyer, Werner (IBM) | Dugan, Casey (IBM) | Millen, David R. (IBM)
We investigate the impact of a discussion snippet's overall sentiment on a user's willingness to read more of a discussion. Using sentiment analysis, we constructed positive, neutral, and negative discussion snippets using the discussion topic and a sample comment from discussions taking place around content on an enterprise social networking site. We computed personalized snippet recommendations for a subset of users and conducted a survey to test how these recommendations were perceived. Our experimental results show that snippets with high sentiments are better discussion "teasers."
Star Quality: Aggregating Reviews to Rank Products and Merchants
McGlohon, Mary (Carnegie Mellon University, Google, Inc.) | Glance, Natalie (Google, Inc.) | Reiter, Zach (Google, Inc.)
Given a set of reviews of products or merchants from a wide range of authors and several reviews websites, how can we measure the true quality of the product or merchant? How do we remove the bias of individual authors or sources? How do we compare reviews obtained from different websites, where ratings may be on different scales (1-5 stars, A/B/C, etc.)? How do we filter out unreliable reviews to use only the ones with ``star quality''? Taking into account these considerations, we analyze data sets from a variety of different reviews sites (the first paper, to our knowledge, to do this). These data sets include 8 million product reviews and 1.5 million merchant reviews. We explore statistic- and heuristic- based models for estimating the true quality of a product or merchant, and compare the performance of these estimators on the task of ranking pairs of objects. We also apply the same models to the task of using Netflix ratings data to rank pairs of movies, and discover that the performance of the different models is surprisingly similar on this data set.
Widespread Worry and the Stock Market
Gilbert, Eric (University of Illinois at Urbana-Champaign) | Karahalios, Karrie (University of Illinois at Urbana-Champaign)
Our emotional state influences our choices. Research on how it happens usually comes from the lab. We know relatively little about how real world emotions affect real world settings, like financial markets. Here, we demonstrate that estimating emotions from weblogs provides novel information about future stock market prices. That is, it provides information not already apparent from market data. Specifically, we estimate anxiety, worry and fear from a dataset of over 20 million posts made on the site LiveJournal. Using a Granger-causal framework, we find that increases in expressions of anxiety, evidenced by computationally-identified linguistic features, predict downward pressure on the S&P 500 index. We also present a confirmation of this result via Monte Carlo simulation. The findings show how the mood of millions in a large online community, even one that primarily discusses daily life, can anticipate changes in a seemingly unrelated system. Beyond this, the results suggest new ways to gauge public opinion and predict its impact.
Study of Static Classification of Social Spam Profiles in MySpace
Irani, Danesh (Georgia Institute of Technology) | Webb, Steve (Georgia Institute of Technology) | Pu, Calton (Georgia Institute of Technology)
Reaching hundreds of millions of users, major social networks have become important target media for spammers. Although practical techniques such as collaborative filters and behavioral analysis are able to reduce spam, they have an inherent lag (to collect sufficient data on the spammer) that also limits their effectiveness. Through an experimental study of over 1.9 million MySpace profiles, we make a case for analysis of static user profile content, possibly as soon as such profiles are created. We compare several machine learning algorithms in their ability to distinguish spam profiles from legitimate profiles. We found that a C4.5 decision tree algorithm achieves the highest accuracy (99.4%) of finding rogue profiles, while naïve Bayes achieves a lower accuracy (92.6%). We also conducted a sensitivity analysis of the algorithms w.r.t. features which may be easily removed by spammers.