budget
Crowdsourced Clustering: Querying Edges vs Triangles
We consider the task of clustering items using answers from non-expert crowd workers. In such cases, the workers are often not able to label the items directly, however, it is reasonable to assume that they can compare items and judge whether they are similar or not. An important question is what queries to make, and we compare two types: random edge queries, where a pair of items is revealed, and random triangles, where a triple is. Since it is far too expensive to query all possible edges and/or triangles, we need to work with partial observations subject to a fixed query budget constraint. When a generative model for the data is available (and we consider a few of these) we determine the cost of a query by its entropy; when such models do not exist we use the average response time per query of the workers as a surrogate for the cost. In addition to theoretical justification, through several simulations and experiments on two real data sets on Amazon Mechanical Turk, we empirically demonstrate that, for a fixed budget, triangle queries uniformly outperform edge queries. Even though, in contrast to edge queries, triangle queries reveal dependent edges, they provide more reliable edges and, for a fixed budget, many more of them. We also provide a sufficient condition on the number of observations, edge densities inside and outside the clusters and the minimum cluster size required for the exact recovery of the true adjacency matrix via triangle queries using a convex optimization-based clustering algorithm.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > France (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- Law (1.00)
- Information Technology > Security & Privacy (0.92)
- Health & Medicine > Diagnostic Medicine > Imaging (0.46)
- Government > Regional Government > North America Government > United States Government (0.46)
- Information Technology > Data Science (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
- Asia > Middle East > Jordan (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Asia > Singapore (0.04)
- (5 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology (0.67)
- Government (0.67)
- Health & Medicine (0.68)
- Information Technology > Services (0.67)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Communications > Social Media (0.69)
- Information Technology > Data Science > Data Mining > Big Data (0.47)
Advice Querying under Budget Constraint for Online Algorithms
This gave birth to learning-augmented algorithms, which use these predictions to go beyond the standard long-standing worst-case limitations. The design of such algorithms requires establishing good tradeoffs between consistency and robustness, i.e. having improved performance when the predictions are accurate, and not behaving poorly
- North America > United States (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Germany > Brandenburg > Potsdam (0.04)
- North America > United States > Virginia (0.05)
- North America > United States > Pennsylvania (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Research Report (0.46)
- Overview (0.46)