Reviews: Semi-crowdsourced Clustering with Deep Generative Models

Neural Information Processing Systems 

A complex DGM is proposed that jointly models observations with crowdsourced annotations of whether or not two observations belong to the same cluster. This allows crowdsourcing non-expert annotations to help with clustering complex data. Importantly, the model is developed for the semi-supervised case, i.e., annotations are only observed for a small proportion of observation pairs. The authors propose a hierarchical VAE structure to model the observations, with a discrete latent-variable z \sim p(z \pi), a continuous latent variable x \sim p(x z), and observed data o \sim p(o x). This is paired with a two-coin David-Skene model which is conditioned on the mixture variable z for annotations: L \sim p(L z_i, z_j, \alpha, \beta), where \alpha and \beta are annotator-specific latent variables that model the "expertise" of the m_th annotator (precision and recall parameters, respectively). To the best of my understanding, through the dependence of the two-coin model on the latent mixture association, though it is not explicitly stated in the paper, z represents cluster association in the model.