AITopics

1302.3639

Country: North America > United States (0.14)

Genre: Research Report (0.82)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

arXiv.org Machine LearningMar-26-2013

Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems

Karger, David R., Oh, Sewoong, Shah, Devavrat

Crowdsourcing systems, in which numerous tasks are electronically distributed to numerous "information piece-workers", have emerged as an effective paradigm for human-powered solving of large scale problems in domains such as image classification, data entry, optical character recognition, recommendation, and proofreading. Because these low-paid workers can be unreliable, nearly all such systems must devise schemes to increase confidence in their answers, typically by assigning each task multiple times and combining the answers in an appropriate manner, e.g. majority voting. In this paper, we consider a general model of such crowdsourcing tasks and pose the problem of minimizing the total price (i.e., number of task assignments) that must be paid to achieve a target overall reliability. We give a new algorithm for deciding which tasks to assign to which workers and for inferring correct answers from the workers' answers. We show that our algorithm, inspired by belief propagation and low-rank matrix approximation, significantly outperforms majority voting and, in fact, is optimal through comparison to an oracle that knows the reliability of every worker. Further, we compare our approach with a more general class of algorithms which can dynamically assign tasks. By adaptively deciding which questions to ask to the next arriving worker, one might hope to reduce uncertainty more efficiently. We show that, perhaps surprisingly, the minimum price necessary to achieve a target reliability scales in the same manner under both adaptive and non-adaptive scenarios. Hence, our non-adaptive approach is order-optimal under both scenarios. This strongly relies on the fact that workers are fleeting and can not be exploited. Therefore, architecturally, our results suggest that building a reliable worker-reputation system is essential to fully harnessing the potential of adaptive designs.

algorithm, crowdsourcing, social media, (21 more...)

1110.3564

Country:

North America > United States > Massachusetts (0.14)
Europe > United Kingdom > England (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (0.86)

Industry: Energy > Oil & Gas (0.46)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)

Neural Information Processing SystemsDec-31-2012

Iterative ranking from pair-wise comparisons

Negahban, Sahand, Oh, Sewoong, Shah, Devavrat

The question of aggregating pairwise comparisons to obtain a global ranking over a collection of objects has been of interest for a very long time: be it ranking of online gamers (e.g. MSR’s TrueSkill system) and chess players, aggregating social opinions, or deciding which product to sell based on transactions. In most settings, in addition to obtaining ranking, finding ‘scores’ for each object (e.g. player’s rating) is of interest to understanding the intensity of the preferences. In this paper, we propose a novel iterative rank aggregation algorithm for discovering scores for objects from pairwise comparisons. The algorithm has a natural random walk interpretation over the graph of objects with edges present between two objects if they are compared; the scores turn out to be the stationary probability of this random walk. The algorithm is model independent. To establish the efficacy of our method, however, we consider the popular Bradley-Terry-Luce (BTL) model in which each object has an associated score which determines the probabilistic outcomes of pairwise comparisons between objects. We bound the finite sample error rates between the scores assumed by the BTL model and those estimated by our algorithm. This, in essence, leads to order-optimal dependence on the number of samples required to learn the scores well by our algorithm. Indeed, the experimental evaluation shows that our (model independent) algorithm performs as well as the Maximum Likelihood Estimator of the BTL model and outperforms a recently proposed algorithm by Ammar and Shah [1].

algorithm, artificial intelligence, chess, (18 more...)

Country:

North America > United States > Massachusetts (0.14)
North America > United States > New York (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games > Chess (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.37)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.34)

arXiv.org Artificial IntelligenceJul-11-2012

Belief Propagation for Min-cost Network Flow: Convergence and Correctness

Gamarnik, David, Shah, Devavrat, Wei, Yehua

Message passing type algorithms such as the so-called Belief Propagation algorithm have recently gained a lot of attention in the statistics, signal processing and machine learning communities as attractive algorithms for solving a variety of optimization and inference problems. As a decentralized, easy to implement and empirically successful algorithm, BP deserves attention from the theoretical standpoint, and here not much is known at the present stage. In order to fill this gap we consider the performance of the BP algorithm in the context of the capacitated minimum-cost network flow problem - the classical problem in the operations research field. We prove that BP converges to the optimal solution in the pseudo-polynomial time, provided that the optimal solution of the underlying problem is unique and the problem input is integral. Moreover, we present a simple modification of the BP algorithm which gives a fully polynomial-time randomized approximation scheme (FPRAS) for the same problem, which no longer requires the uniqueness of the optimal solution. This is the first instance where BP is proved to have fully-polynomial running time. Our results thus provide a theoretical justification for the viability of BP as an attractive method to solve an important class of optimization problems.

artificial intelligence, optimal solution, optimization problem, (16 more...)

arXiv.org Artificial Intelligence

1004.1586

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Industry: Energy > Oil & Gas (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Neural Information Processing SystemsDec-31-2011

Iterative Learning for Reliable Crowdsourcing Systems

Karger, David R., Oh, Sewoong, Shah, Devavrat

Crowdsourcing systems, in which tasks are electronically distributed to numerous ``information piece-workers'', have emerged as an effective paradigm for human-powered solving of large scale problems in domains such as image classification, data entry, optical character recognition, recommendation, and proofreading. Because these low-paid workers can be unreliable, nearly all crowdsourcers must devise schemes to increase confidence in their answers, typically by assigning each task multiple times and combining the answers in some way such as majority voting. In this paper, we consider a general model of such rowdsourcing tasks, and pose the problem of minimizing the total price (i.e., number of task assignments) that must be paid to achieve a target overall reliability. We give new algorithms for deciding which tasks to assign to which workers and for inferring correct answers from the workers’ answers. We show that our algorithm significantly outperforms majority voting and, in fact, are asymptotically optimal through comparison to an oracle that knows the reliability of every worker.

algorithm, crowdsourcing, social media, (19 more...)

Country:

Europe > United Kingdom > England (0.14)
North America > United States > Massachusetts (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications > Social Media > Crowdsourcing (0.86)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.68)

arXiv.org Machine LearningSep-21-2011

Sparse Choice Models

Farias, Vivek F., Jagabathula, Srikanth, Shah, Devavrat

Choice models, which capture popular preferences over objects of interest, play a key role in making decisions whose eventual outcome is impacted by human choice behavior. In most scenarios, the choice model, which can effectively be viewed as a distribution over permutations, must be learned from observed data. The observed data, in turn, may frequently be viewed as (partial, noisy) information about marginals of this distribution over permutations. As such, the search for an appropriate choice model boils down to learning a distribution over permutations that is (near-)consistent with observed information about this distribution. In this work, we pursue a non-parametric approach which seeks to learn a choice model (i.e. a distribution over permutations) with {\em sparsest} possible support, and consistent with observed data. We assume that the data observed consists of noisy information pertaining to the marginals of the choice model we seek to learn. We establish that {\em any} choice model admits a `very' sparse approximation in the sense that there exists a choice model whose support is small relative to the dimension of the observed data and whose marginals approximately agree with the observed marginal information. We further show that under, what we dub, `signature' conditions, such a sparse approximation can be found in a computationally efficiently fashion relative to a brute force approach. An empirical study using the American Psychological Association election data-set suggests that our approach manages to unearth useful structural properties of the underlying choice model using the sparse approximation found. Our results further suggest that the signature condition is a potential alternative to the recently popularized Restricted Null Space condition for efficient recovery of sparse models.

artificial intelligence, choice model, optimization problem, (18 more...)

1011.4339

Country:

North America > United States > New York (0.14)
North America > United States > Massachusetts (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

arXiv.org Machine LearningNov-2-2010

Community Detection in Networks: The Leader-Follower Algorithm

Shah, Devavrat, Zaman, Tauhid

Traditional spectral clustering methods cannot naturally learn the number of communities in a network and often fail to detect smaller community structure in dense networks because they are based upon external community connectivity properties such as graph cuts. We propose an algorithm for detecting community structure in networks called the leader-follower algorithm which is based upon the natural internal structure expected of communities in social networks. The algorithm uses the notion of network centrality in a novel manner to differentiate leaders (nodes which connect different communities) from loyal followers (nodes which only have neighbors within a single community). Using this approach, it is able to naturally learn the communities from the network structure and does not require the number of communities as an input, in contrast to other common methods such as spectral clustering. We prove that it will detect all of the communities exactly for any network possessing communities with the natural internal structure expected in social networks. More importantly, we demonstrate the effectiveness of the leader-follower algorithm in the context of various real networks ranging from social networks such as Facebook to biological networks such as an fMRI based human brain network. We find that the leader-follower algorithm finds the relevant community structure in these networks without knowing the number of communities beforehand. Also, because the leader-follower algorithm detects communities using their internal structure, we find that it can resolve a finer community structure in dense networks than common spectral clustering methods based on external community structure.

algorithm, neurology, social media, (21 more...)

1011.0774

Genre: Research Report (0.50)

Industry:

Information Technology > Services (0.96)
Health & Medicine > Therapeutic Area > Neurology (0.34)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Machine LearningOct-28-2010

Rumors in a Network: Who's the Culprit?

Shah, Devavrat, Zaman, Tauhid

We provide a systematic study of the problem of finding the source of a rumor in a network. We model rumor spreading in a network with a variant of the popular SIR model and then construct an estimator for the rumor source. This estimator is based upon a novel topological quantity which we term \textbf{rumor centrality}. We establish that this is an ML estimator for a class of graphs. We find the following surprising threshold phenomenon: on trees which grow faster than a line, the estimator always has non-trivial detection probability, whereas on trees that grow like a line, the detection probability will go to 0 as the network grows. Simulations performed on synthetic networks such as the popular small-world and scale-free networks, and on real networks such as an internet AS network and the U.S. electric power grid network, show that the estimator either finds the source exactly or within a few hops of the true source across different network topologies. We compare rumor centrality to another common network centrality notion known as distance centrality. We prove that on trees, the rumor center and distance center are equivalent, but on general networks, they may differ. Indeed, simulations show that rumor centrality outperforms distance centrality in finding rumor sources in networks which are not tree-like.

artificial intelligence, machine learning, node, (17 more...)

0909.4370

Genre: Research Report (0.63)

Industry: Energy > Power Industry (1.00)

Technology:

Information Technology > Communications > Networks (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning (0.67)

Neural Information Processing SystemsDec-31-2009

Inferring rankings under constrained sensing

Jagabathula, Srikanth, Shah, Devavrat

We consider the question of determining a real-valued function on the space of permutations of n elements with very limited observations.

algorithm, artificial intelligence, permutation, (17 more...)

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Technology: Information Technology > Artificial Intelligence (0.68)

Neural Information Processing SystemsDec-31-2009

A Data-Driven Approach to Modeling Choice

Farias, Vivek, Jagabathula, Srikanth, Shah, Devavrat

We visit the following fundamental problem: For a `generic model of consumer choice (namely, distributions over preference lists) and a limited amount of data on how consumers actually make decisions (such as marginal preference information), how may one predict revenues from offering a particular assortment of choices? This problem is central to areas within operations research, marketing and econometrics. We present a framework to answer such questions and design a number of tractable algorithms (from a data and computational standpoint) for the same.

artificial intelligence, choice model, optimization problem, (17 more...)

Country: North America > United States (0.14)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)