Goto

Collaborating Authors

 Industry


Predicting Disease Transmission from Geo-Tagged Micro-Blog Data

AAAI Conferences

These results far outperform alternative models. This work is an important step towards the development Recent work has demonstrated that micro-blogging data can of automated methods that identify disease vectors, trace the be used to predict a variety of phenomena, including movie transmission between concrete individuals, and ultimately box-office revenues (Asur and Huberman 2010), elections help us understand and predict the spread of infectious diseases (Tumasjan et al. 2010), and flu epidemics (Lampos, De Bie, with fine granularity. It provides a foundation for and Cristianini 2010). Most research to date has focused on research on fundamental questions of public health, such predicting aggregate properties of the population from the as: How does an epidemic on a population scale emerge activity of the bloggers. A different kind of problem one can from low-level interactions between people in the course pose, however, is to predict the behavior or state of particular of their everyday lives? Can we identify a potentially noncooperative individuals within the social network. For instance, one individual who is a vector of a dangerous disease, could try to predict whether a person will go to a movie or i.e., a "Typhoid Mary"? What is the interaction between vote for a particular candidate based on micro-blog data. The friendship, location, and co-location in the spread of individual's own data may or may not be accessible.


Adaptive Polling for Information Aggregation

AAAI Conferences

The flourishing of online labor markets such as Amazon Mechanical Turk (MTurk) makes it easy to recruit many workers for solving small tasks. We study whether information elicitation and aggregation over a combinatorial space can be achieved by integrating small pieces of potentially imprecise information, gathered from a large number of workers through simple, one-shot interactions in an online labor market. We consider the setting of predicting the ranking of $n$ competing candidates, each having a hidden underlying strength parameter. At each step, our method estimates the strength parameters from the collected pairwise comparison data and adaptively chooses another pairwise comparison question for the next recruited worker. Through an MTurk experiment, we show that the adaptive method effectively elicits and aggregates information, outperforming a naive method using a random pairwise comparison question at each step.


Multinomial Relation Prediction in Social Data: A Dimension Reduction Approach

AAAI Conferences

The recent popularization of social web services has made them one of the primary uses of the World Wide Web. An important concept in social web services is social actions such as making connections and communicating with others and adding annotations to web resources. Predicting social actions would improve many fundamental web applications, such as recommendations and web searches. One remarkable characteristic of social actions is that they involve multiple and heterogeneous objects such as users, documents, keywords, and locations. However, the high-dimensional property of such multinomial relations poses one fundamental challenge, that is, predicting multinomial relations with only a limited amount of data. In this paper, we propose a new multinomial relation prediction method, which is robust to data sparsity. We transform each instance of a multinomial relation into a set of binomial relations between the objects and the multinomial relation of the involved objects. We then apply an extension of a low-dimensional embedding technique to these binomial relations, which results in a generalized eigenvalue problem guaranteeing global optimal solutions. We also incorporate attribute information as side information to address the “cold start” problem in multinomial relation prediction. Experiments with various real-world social web service datasets demonstrate that the proposed method is more robust against data sparseness as compared to several existing methods, which can only find sub-optimal solutions.


Dynamically Switching between Synergistic Workflows for Crowdsourcing

AAAI Conferences

To ensure quality results from unreliable crowdsourced workers, task designers often construct complex workflows and aggregate worker responses from redundant runs. Frequently, they experiment with several alternative workflows to accomplish the task, and eventually deploy the one that achieves the best performance during early trials. Surprisingly, this seemingly natural design paradigm does not achieve the full potential of crowdsourcing. In particular, using a single workflow (even the best) to accomplish a task is suboptimal. We show that alternative workflows can compose synergistically to yield much higher quality output. We formalize the insight with a novel probabilistic graphical model. Based on this model, we design and implement AGENTHUNT, a POMDP-based controller that dynamically switches between these workflows to achieve higher returns on investment. Additionally, we design offline and online methods for learning model parameters. Live experiments on Amazon Mechanical Turk demonstrate the superiority of AGENTHUNT for the task of generating NLP training data, yielding up to 50% error reduction and greater net utility compared to previous methods.


ET-LDA: Joint Topic Modeling for Aligning Events and their Twitter Feedback

AAAI Conferences

During broadcast events such as the Superbowl, the U.S. Presidential and Primary debates, etc., Twitter has become the de facto platform for crowds to share perspectives and commentaries about them. Given an event and an associated large-scale collection of tweets, there are two fundamental research problems that have been receiving increasing attention in recent years. One is to extract the topics covered by the event and the tweets; the other is to segment the event. So far these problems have been viewed separately and studied in isolation. In this work, we argue that these problems are in fact inter-dependent and should be addressed together. We develop a joint Bayesian model that performs topic modeling and event segmentation in one unified framework. We evaluate the proposed model both quantitatively and qualitatively on two large-scale tweet datasets associated with two events from different domains to show that it improves significantly over baseline models.


Music-Inspired Texture Representation

AAAI Conferences

Techniques for music recommendation are increasingly relying on hybrid representations to retrieve new and exciting music. A key component of these representations is musical content, with texture being the most widely used feature. Current techniques for representing texture however are inspired by speech, not music, therefore music representations are not capturing the correct nature of musical texture. In this paper we investigate two parts of the well-established mel-frequency cepstral coefficients (MFCC) representation: the resolution of mel-frequencies related to the resolution of musical notes; and how best to describe the shape of texture. Through contextualizing these parts, and their relationship to music, a novel music-inspired texture representation is developed. We evaluate this new texture representation by applying it to the task of music recommendation. We use the representation to build three recommendation models, based on current state-of-the-art methods. Our results show that by understanding two key parts of texture representation, it is possible to achieve a significant recommendation improvement. This contribution of a music-inspired texture representation will not only improve content-based representation, but will allow hybrid systems to take advantage of a stronger content component.


Online Task Assignment in Crowdsourcing Markets

AAAI Conferences

We explore the problem of assigning heterogeneous tasks to workers with different, unknown skill sets in crowdsourcing markets such as Amazon Mechanical Turk. We first formalize the online task assignment problem, in which a requester has a fixed set of tasks and a budget that specifies how many times he would like each task completed. Workers arrive one at a time (with the same worker potentially arriving multiple times), and must be assigned to a task upon arrival. The goal is to allocate workers to tasks in a way that maximizes the total benefit that the requester obtains from the completed work. Inspired by recent research on the online adwords problem, we present a two-phase exploration-exploitation assignment algorithm and prove that it is competitive with respect to the optimal offline algorithm which has access to the unknown skill levels of each worker. We empirically evaluate this algorithm using data collected on Mechanical Turk and show that it performs better than random assignment or greedy algorithms. To our knowledge, this is the first work to extend the online primal-dual technique used in the online adwords problem to a scenario with unknown parameters, and the first to offer an empirical validation of an online primal-dual algorithm.


Querying Linked Ontological Data through Distributed Summarization

AAAI Conferences

As the semantic web expands, ontological data becomes distributed over a large network of data sources on the Web. Consequently, evaluating queries that aim to tap into this distributed semantic database necessitates the ability to consult multiple data sources efficiently. In this paper, we propose methods and heuristics to efficiently query distributed ontological data based on a series of properties of summarized data. In our approach, each source summarizes its data as another RDF graph, and relevant section of these summaries are merged and analyzed at query evaluation time. We show how the analysis of these summaries enables more efficient source selection, query pruning and transformation of expensive distributed joins into local joins.


Fused Matrix Factorization with Geographical and Social Influence in Location-Based Social Networks

AAAI Conferences

Recently, location-based social networks (LBSNs), such as Gowalla, Foursquare, Facebook, and Brightkite, etc., have attracted millions of users to share their social friendship and their locations via check-ins. The available check-in information makes it possible to mine users’ preference on locations and to provide favorite recommendations. Personalized Point-of-interest (POI) recommendation is a significant task in LBSNs since it can help targeted users explore their surroundings as well as help third-party developers to provide personalized services. To solve this task, matrix factorization is a promising tool due to its success in recommender systems. However, previously proposed matrix factorization (MF) methods do not explore geographical influence, e.g., multi-center check-in property, which yields suboptimal solutions for the recommendation. In this paper, to the best of our knowledge, we are the first to fuse MF with geographical and social influence for POI recommendation in LBSNs. We first capture the geographical influence via modeling the probability of a user’s check-in on a location as a Multi-center Gaussian Model (MGM). Next, we include social information and fuse the geographical influence into a generalized matrix factorization framework. Our solution to POI recommendation is efficient and scales linearly with the number of observations. Finally, we conduct thorough experiments on a large-scale real-world LBSNs dataset and demonstrate that the fused matrix factorization framework with MGM utilizes the distance information sufficiently and outperforms other state-of-the-art methods significantly.


Local stability of Belief Propagation algorithm with multiple fixed points

arXiv.org Machine Learning

A number of problems in statistical physics and computer science can be expressed as the computation of marginal probabilities over a Markov random field. Belief propagation, an iterative message-passing algorithm, computes exactly such marginals when the underlying graph is a tree. But it has gained its popularity as an efficient way to approximate them in the more general case, even if it can exhibits multiple fixed points and is not guaranteed to converge. In this paper, we express a new sufficient condition for local stability of a belief propagation fixed point in terms of the graph structure and the beliefs values at the fixed point. This gives credence to the usual understanding that Belief Propagation performs better on sparse graphs.