IBM India Research Lab
Analysis of Sampling Algorithms for Twitter
Palguna, Deepan Subrahmanian (Purdue University) | Joshi, Vikas (IBM India Research Lab) | Chakaravarthy, Venkatesan (IBM India Research Lab) | Kothari, Ravi (IBM India Research Lab) | Subramaniam, LV (IBM India Research Lab)
The daily volume of Tweets in Twitter is around 500 million, and the impact of this data on applications ranging from public safety, opinion mining, news broadcast, etc., is increasing day by day. Analyzing large volumes of Tweets for various applications would require techniques that scale well with the number of Tweets. In this work we come up with a theoretical formulation for sampling Twitter data. We introduce novel statistical metrics to quantify the statistical representativeness of the Tweet sample, and derive sufficient conditions on the number of samples needed for obtaining highly representative Tweet samples. These new statistical metrics quantify the representativeness or goodness of the sample in terms of frequent keyword identification and in terms of restoring public sentiments associated with these keywords. We use uniform random sampling with replacement as our algorithm, and sampling could serve as a first step before using other sophisticated summarization methods to generate summaries for human use. We show that experiments conducted on real Twitter data agree with our bounds. In these experiments, we also compare different kinds of random sampling algorithms. Our bounds are attractive since they do not depend on the total number of Tweets in the universe. Although our ideas and techniques are specific to Twitter, they could find applications in other areas as well.
Threats and Trade-Offs in Resource Critical Crowdsourcing Tasks Over Networks
Nath, Swaprava (Indian Institute of Science, Bangalore) | Dayama, Pankaj (Global General Motors R&D โ India Science Lab) | Garg, Dinesh (IBM India Research Lab) | Narahari, Y. (Indian Institute of Science) | Zou, James (Harvard University)
In recent times, crowdsourcing over social networks has emerged as an active tool for complex task execution. In this paper, we address the problem faced by a planner to incentivize agents in the network to execute a task and also help in recruiting other agents for this purpose. We study this mechanism design problem under two natural resource optimization settings: (1) cost critical tasks, where the planner's goal is to minimize the total cost, and (2) time critical tasks, where the goal is to minimize the total time elapsed before the task is executed. We define a set of fairness properties that should be ideally satisfied by a crowdsourcing mechanism. We prove that no mechanism can satisfy all these properties simultaneously. We relax some of these properties and define their approximate counterparts. Under appropriate approximate fairness criteria, we obtain a non-trivial family of payment mechanisms. Moreover, we provide precise characterizations of cost critical and time critical mechanisms.
Social Navigation through the Spoken Web: Improving Audio Access through Collaborative Filtering in Gujarat, India
Farrell, Robert (IBM Research) | Das, Rajarshi (IBM Research) | Rajput, Nitendra (IBM India Research Lab)
The rapid uptake of mobile phones, cheaper and more Given the potentially large number of users of the Spoken widespread mobile connectivity, and increasing familiarity Web system and the likelihood of shared information needs with technology are driving Internet adoption in developing and significant user similarities, we expect considerable improvements nations, but major hurdles still remain. First, today's Internet in audio navigation from using CF. is mostly in English and is thus largely inaccessible to A useful distinction among CFbased approaches arises billions of people for whom English is not a native or second from the types of data used to associate users to products language. Second, today's Internet is accessible largely and other items. In some scenarios, users may provide explicit through text-based technologies (web browsing, email, text feedback about their interest in products through ratings.