AITopics | Crowdsourcing

Collaborating Authors

Crowdsourcing

News Overviews Instructional Materials AI-Alerts Classics

Appendix

Neural Information Processing SystemsMay-23-2025, 15:53:48 GMT

Figure 9: Example showing how a single line of HTML code is rendered by a browser's renderer. In this example, we can see that the tags

delimit different blocks which are therefore spaced by line breaks while other tags, such as , are rendered on the same line of text that precedes and follows them.

artificial intelligence, natural language, social media, (17 more...)

Neural Information Processing Systems

Country: Africa (0.20)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Communications > Social Media > Crowdsourcing (0.31)

Add feedback

b1f7288854d3bd476c17725c2d85967f-Paper-Conference.pdf

Neural Information Processing SystemsMar-27-2025, 12:43:37 GMT

artificial intelligence, machine learning, social media, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report (0.46)

Industry:

Information Technology (0.68)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Communications > Social Media > Crowdsourcing (0.46)

Add feedback

Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon Captioning

Neural Information Processing SystemsMar-27-2025, 12:21:34 GMT

We present a novel multimodal preference dataset for creative tasks, consisting of over 250 million human ratings on more than 2.2 million captions, collected through crowdsourcing rating data for The New Yorker's weekly cartoon caption contest over the past eight years. This unique dataset supports the development and evaluation of multimodal large language models and preference-based fine-tuning algorithms for humorous caption generation. We propose novel benchmarks for judging the quality of model-generated captions, utilizing both GPT4 and human judgments to establish ranking-based evaluation strategies. Our experimental results highlight the limitations of current fine-tuning methods, such as RLHF and DPO, when applied to creative tasks. Furthermore, we demonstrate that even stateof-the-art models like GPT4 and Claude currently underperform top human contestants in generating humorous captions. As we conclude this extensive data collection effort, we release the entire preference dataset to the research community, fostering further advancements in AI humor generation and evaluation.

caption, large language model, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States > New York (0.27)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

KFNN: K-Free Nearest Neighbor For Crowdsourcing

Neural Information Processing SystemsMar-27-2025, 09:20:48 GMT

To reduce annotation costs, it is common in crowdsourcing to collect only a few noisy labels from different crowd workers for each instance. However, the limited noisy labels restrict the performance of label integration algorithms in inferring the unknown true label for the instance. Recent works have shown that leveraging neighbor instances can help alleviate this problem. Yet, these works all assume that each instance has the same neighborhood size, which defies common sense. To address this gap, we propose a novel label integration algorithm called K-free nearest neighbor (KFNN). In KFNN, the neighborhood size of each instance is automatically determined based on its attributes and noisy labels.

artificial intelligence, machine learning, social media, (16 more...)

Neural Information Processing Systems

Country:

Asia (0.68)
North America > United States > California (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Health & Medicine > Therapeutic Area (0.47)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Communications > Social Media > Crowdsourcing (0.87)

Add feedback

A Supplementary Material

Amelia Jimenez Sanchez

Neural Information Processing SystemsMar-27-2025, 08:26:01 GMT

A.1 Data Cards Table 2 shows the extracted documentation parameters from Kaggle and HuggingFace, which we categorized according to Datasheets [40]. On HuggingFace, we find information about the annotation creators (e.g., crowdsource, experts, ml-generated) or specific task categories (e.g., image-classification, image-to-text, text-to-image). Such parameters can be used to filter results when searching on HuggingFace, potentially enabling systematic analysis of a specific task or tag. On Kaggle, we notice that some important parameters shown in the dataset website such as temporal and geospatial coverage, data collection methodology, provenance, DOI citation, and update frequency cannot be automatically extracted with their API, so we manually included them. Kaggle automatically computes a usability score, which is associated with the tag "well-documented", and used for ranking results when searching for a dataset.

artificial intelligence, dataset, social media, (17 more...)

Neural Information Processing Systems

Industry: Health & Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (0.37)
Information Technology > Communications > Social Media > Crowdsourcing (0.35)

Add feedback

Inference Aided Reinforcement Learning for Incentive Mechanism Design in Crowdsourcing

Zehong Hu, Yitao Liang, Jie Zhang, Zhao Li, Yang Liu

Neural Information Processing SystemsMar-27-2025, 04:33:21 GMT

Incentive mechanisms for crowdsourcing are designed to incentivize financially self-interested workers to generate and report high-quality labels. Existing mechanisms are often developed as one-shot static solutions, assuming a certain level of knowledge about worker models (expertise levels, costs of exerting efforts, etc.). In this paper, we propose a novel inference aided reinforcement mechanism that learns to incentivize high-quality data sequentially and requires no such prior assumptions. Specifically, we first design a Gibbs sampling augmented Bayesian inference algorithm to estimate workers' labeling strategies from the collected labels at each step. Then we propose a reinforcement incentive learning (RIL) method, building on top of the above estimates, to uncover how workers respond to different payments. RIL dynamically determines the payment without accessing any ground-truth labels. We theoretically prove that RIL is able to incentivize rational workers to provide high-quality labels. Empirical results show that our mechanism performs consistently well under both rational and non-fully rational (adaptive learning) worker models. Besides, the payments offered by RIL are more robust and have lower variances compared to the existing one-shot mechanisms.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.91)
Information Technology > Communications > Social Media > Crowdsourcing (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback

Crowdsourcing via Pairwise Co-occurrences: Identifiability and Algorithms

Shahana Ibrahim, Xiao Fu, Nikolaos Kargas, Kejun Huang

Neural Information Processing SystemsMar-26-2025, 19:20:19 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, social media, (19 more...)

Neural Information Processing Systems

Country: North America > United States > Minnesota (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Communications > Social Media > Crowdsourcing (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Triple Eagle: Simple, Fast and Practical Budget-Feasible Mechanisms

Neural Information Processing SystemsMar-26-2025, 17:15:26 GMT

We revisit the classical problem of designing Budget-Feasible Mechanisms (BFMs) for submodular valuation functions, which has been extensively studied since the seminal paper of Singer [FOCS'10] due to its wide applications in crowdsourcing and social marketing. We propose TripleEagle, a novel algorithmic framework for designing BFMs, based on which we present several simple yet effective BFMs that achieve better approximation ratios than the state-of-the-art work for both monotone and non-monotone submodular valuation functions. Moreover, our BFMs are the first in the literature to achieve linear complexities while ensuring obvious strategyproofness, making them more practical than the previous BFMs. We conduct extensive experiments to evaluate the empirical performance of our BFMs, and the experimental results strongly demonstrate the efficiency and effectiveness of our approach.

artificial intelligence, machine learning, mechanism, (17 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Communications > Social Media > Crowdsourcing (0.49)

Add feedback

IWBVT: Instance Weighting-based Bias-Variance Trade-off for Crowdsourcing

Neural Information Processing SystemsMar-26-2025, 04:22:36 GMT

In recent years, a large number of algorithms for label integration and noise correction have been proposed to infer the unknown true labels of instances in crowdsourcing. They have made great advances in improving the label quality of crowdsourced datasets. However, due to the presence of intractable instances, these algorithms are usually not as significant in improving the model quality as they are in improving the label quality. To improve the model quality, this paper proposes an instance weighting-based bias-variance trade-off (IWBVT) approach. IWBVT at first proposes a novel instance weighting method based on the complementary set and entropy, which mitigates the impact of intractable instances and thus makes the bias and variance of trained models closer to the unknown true results. Then, IWBVT performs probabilistic loss regressions based on the bias-variance decomposition, which achieves the bias-variance trade-off and thus reduces the generalization error of trained models. Experimental results indicate that IWBVT can serve as a universal post-processing approach to significantly improving the model quality of existing state-of-the-art label integration algorithms and noise correction algorithms.

artificial intelligence, machine learning, social media, (16 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Add feedback

Semi-crowdsourced Clustering with Deep Generative Models

Neural Information Processing SystemsMar-25-2025, 02:33:43 GMT

We consider the semi-supervised clustering problem where crowdsourcing provides noisy information about the pairwise comparisons on a small subset of data, i.e., whether a sample pair is in the same cluster. We propose a new approach that includes a deep generative model (DGM) to characterize low-level features of the data, and a statistical relational model for noisy pairwise annotations on its subset. The two parts share the latent variables. To make the model automatically trade-off between its complexity and fitting data, we also develop its fully Bayesian variant. The challenge of inference is addressed by fast (natural-gradient) stochastic variational inference algorithms, where we effectively combine variational message passing for the relational part and amortized learning of the DGM under a unified framework. Empirical results on synthetic and real-world datasets show that our model outperforms previous crowdsourced clustering methods.

annotation, artificial intelligence, machine learning, (14 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.62)

Add feedback