AITopics | semi-crowdsourced clustering

Semi-crowdsourced Clustering with Deep Generative Models

Neural Information Processing SystemsNov-20-2025, 22:03:48 GMT

We consider the semi-supervised clustering problem where crowdsourcing provides noisy information about the pairwise comparisons on a small subset of data, i.e., whether a sample pair is in the same cluster. We propose a new approach that includes a deep generative model (DGM) to characterize low-level features of the data, and a statistical relational model for noisy pairwise annotations on its subset. The two parts share the latent variables. To make the model automatically trade-off between its complexity and fitting data, we also develop its fully Bayesian variant. The challenge of inference is addressed by fast (natural-gradient) stochastic variational inference algorithms, where we effectively combine variational message passing for the relational part and amortized learning of the DGM under a unified framework. Empirical results on synthetic and real-world datasets show that our model outperforms previous crowdsourced clustering methods.

deep generative model, name change, semi-crowdsourced clustering, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.32)

Add feedback

Reviews: Semi-crowdsourced Clustering with Deep Generative Models

Neural Information Processing SystemsOct-7-2024, 08:42:28 GMT

A complex DGM is proposed that jointly models observations with crowdsourced annotations of whether or not two observations belong to the same cluster. This allows crowdsourcing non-expert annotations to help with clustering complex data. Importantly, the model is developed for the semi-supervised case, i.e., annotations are only observed for a small proportion of observation pairs. The authors propose a hierarchical VAE structure to model the observations, with a discrete latent-variable z \sim p(z \pi), a continuous latent variable x \sim p(x z), and observed data o \sim p(o x). This is paired with a two-coin David-Skene model which is conditioned on the mixture variable z for annotations: L \sim p(L z_i, z_j, \alpha, \beta), where \alpha and \beta are annotator-specific latent variables that model the "expertise" of the m_th annotator (precision and recall parameters, respectively). To the best of my understanding, through the dependence of the two-coin model on the latent mixture association, though it is not explicitly stated in the paper, z represents cluster association in the model.

annotation, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.40)

Add feedback

Semi-Crowdsourced Clustering: Generalizing Crowd Labeling by Robust Distance Metric Learning

Neural Information Processing SystemsFeb-16-2024, 07:37:48 GMT

One of the main challenges in data clustering is to define an appropriate similarity measure between two objects. Crowdclustering addresses this challenge by defining the pairwise similarity based on the manual annotations obtained through crowdsourcing. Despite its encouraging results, a key limitation of crowdclustering is that it can only cluster objects when their manual annotations are available. To address this limitation, we propose a new approach for clustering, called \textit{semi-crowdsourced clustering} that effectively combines the low-level features of objects with the manual annotations of a subset of the objects obtained via crowdsourcing. The key idea is to learn an appropriate similarity measure, based on the low-level features of objects, from the manual annotations of only a small portion of the data to be clustered.

manual annotation, robust distance metric learning, semi-crowdsourced clustering, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Semi-Crowdsourced Clustering: Generalizing Crowd Labeling by Robust Distance Metric Learning

Yi, Jinfeng, Jin, Rong, Jain, Shaili, Yang, Tianbao, Jain, Anil K.

Neural Information Processing SystemsFeb-14-2020, 23:12:41 GMT

One of the main challenges in data clustering is to define an appropriate similarity measure between two objects. Crowdclustering addresses this challenge by defining the pairwise similarity based on the manual annotations obtained through crowdsourcing. Despite its encouraging results, a key limitation of crowdclustering is that it can only cluster objects when their manual annotations are available. To address this limitation, we propose a new approach for clustering, called \textit{semi-crowdsourced clustering} that effectively combines the low-level features of objects with the manual annotations of a subset of the objects obtained via crowdsourcing. The key idea is to learn an appropriate similarity measure, based on the low-level features of objects, from the manual annotations of only a small portion of the data to be clustered.

manual annotation, robust distance metric learning, semi-crowdsourced clustering, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Semi-crowdsourced Clustering with Deep Generative Models

Luo, Yucen, TIAN, TIAN, Shi, Jiaxin, Zhu, Jun, Zhang, Bo

Neural Information Processing SystemsFeb-14-2020, 12:12:03 GMT

We consider the semi-supervised clustering problem where crowdsourcing provides noisy information about the pairwise comparisons on a small subset of data, i.e., whether a sample pair is in the same cluster. We propose a new approach that includes a deep generative model (DGM) to characterize low-level features of the data, and a statistical relational model for noisy pairwise annotations on its subset. The two parts share the latent variables. To make the model automatically trade-off between its complexity and fitting data, we also develop its fully Bayesian variant. The challenge of inference is addressed by fast (natural-gradient) stochastic variational inference algorithms, where we effectively combine variational message passing for the relational part and amortized learning of the DGM under a unified framework.

deep generative model, semi-crowdsourced clustering, subset

Neural Information Processing Systems

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (0.82)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Filters

Collaborating Authors

semi-crowdsourced clustering

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Semi-crowdsourced Clustering with Deep Generative Models

Reviews: Semi-crowdsourced Clustering with Deep Generative Models

Semi-Crowdsourced Clustering: Generalizing Crowd Labeling by Robust Distance Metric Learning

Semi-Crowdsourced Clustering: Generalizing Crowd Labeling by Robust Distance Metric Learning

Semi-crowdsourced Clustering with Deep Generative Models