Measuring the Efficiency of Charitable Giving with Content Analysis and Crowdsourcing

AAAI Conferences

In the U.S., individuals give more than 200 billion dollars to over 50 thousand charities each year, yet how people make these choices is not well understood. In this study, we use data from CharityNavigator.org and web browsing data from Bing toolbar to understand charitable giving choices. Our main goal is to use data on charities' overhead expenses to better understand efficiency in the charity marketplace. A preliminary analysis indicates that the average donor is "wasting" more than 15% of their contribution by opting for poorly run organizations as opposed to higher rated charities in the same Charity Navigator categorical group. However, charities within these groups may not represent good substitutes for each other. We use text analysis to identify substitutes for charities based on their stated missions and validate these substitutes with crowd-sourced labels. Using these similarity scores, we simulate market outcomes using web browsing and revenue data. With more realistic similarity requirements, the estimated loss drops by 75%—much of what looked like inefficient giving can be explained by crowd-validated similarity requirements that are not fulfilled by most charities within the same category. A choice experiment helps us further investigate the extent to which a recommendation system could impact the market. The results indicate that money could be redirected away from the long-tail of inefficient organizations. If widely adopted, the savings would be in the billions of dollars, highlighting the role the web could have in shaping this important market.


Human eyes assist drones, teach machines to see

#artificialintelligence

Drone images accumulate much faster than they can be analyzed. Researchers have developed a new approach that combines crowdsourcing and machine learning to speed up the process. Who would win in a real-life game of "Where's Waldo," humans or computers? A recent study suggests that when speed and accuracy are critical, an approach combing both human and machine intelligence would take the prize. With drones being used to monitor everything natural disaster sites, pollution, or wildlife populations, analyzing drone images in real-time has become a critically important big data challenge.


Quality Expectation-Variance Tradeoffs in Crowdsourcing Contests

AAAI Conferences

We examine designs for crowdsourcing contests, where participants compete for rewards given to superior solutions of a task. We theoretically analyze tradeoffs between the expectation and variance of the principal's utility (i.e. the best solution's quality), and empirically test our theoretical predictions using a controlled experiment on Amazon Mechanical Turk. Our evaluation method is also crowdsourcing based and relies on the peer prediction mechanism. Our theoretical analysis shows an expectation-variance tradeoff of the principal's utility in such contests through a Pareto efficient frontier. In particular, we show that the simple contest with 2 authors and the 2-pair contest have good theoretical properties. In contrast, our empirical results show that the 2-pair contest is the superior design among all designs tested, achieving the highest expectation and lowest variance of the principal's utility.


Loneliness in a Connected World: Analyzing Online Activity and Expressions on Real Life Relationships of Lonely Users

AAAI Conferences

Although loneliness is a very familiar emotion, little is known about it. An aspect to explore is the prevalence of loneliness in the connected world that social media sites like Twitter provide. In light of this, this study investigates the Twitter data of users that have expressed loneliness to understand the phenomenon. Since our primary material are tweets, we developed various indices that can measure social activities reflected in online relationships and real life relationship solely through online Twitter data. Through these indices, the relations between social activity and loneliness were investigated. The results show that high lonely users seem to have low online activity, high positive expressions on real life relationships, and narrow ingroups.


A Data-Driven Study of View Duration on YouTube

AAAI Conferences

Video watching had emerged as one of the most frequent media activities on the Internet. Yet, little is known about how users watch online video. Using two distinct YouTube datasets, a set of random YouTube videos crawled from the Web and a set of videos watched by participants tracked by a Chrome extension, we examine whether and how indicators of collective preferences and reactions are associated with view duration of videos. We show that video view duration is positively associated with the video's view count, the number of likes per view, and the negative sentiment in the comments. These metrics and reactions have a significant predictive power over the duration the video is watched by individuals. Our findings provide a more precise understandings of user engagement with video content in social media beyond view count.