### A Classification-Based Approach to Semi-Supervised Clustering with Pairwise Constraints

A Classification-Based Approach to Semi-Supervised Clustering with Pairwise Constraints Marek Smieja a,, Łukasz Struski a, Mário A. T. Figueiredo b a Faculty of Mathematics and Computer Science, Jagiellonian University, Kraków, Poland b Instituto de T elecomunicações, Instituto Superior Técnico, Universidade de Lisboa, Lisbon, PortugalAbstract In this paper, we introduce a neural network framework for semi-supervised clustering (SSC) with pairwise (must-link or cannot-link) constraints. In contrast to existing approaches, we decompose SSC into two simpler classification tasks/stages: the first stage uses a pair of Siamese neural networks to label the unlabeled pairs of points as must-link or cannot-link; the second stage uses the fully pairwise-labeled dataset produced by the first stage in a supervised neural-network-based clustering method. The proposed approach, S 3 C 2 (Semi-Supervised Siamese C lassifiers for C lustering), is motivated by the observation that binary classification (such as assigning pairwise relations) is usually easier than multi-class clustering with partial supervision. On the other hand, being classification-based, our method solves only well-defined classification problems, rather than less well specified clustering tasks. Extensive experiments on various datasets demonstrate the high performance of the proposed method. Keywords: semi-supervised clustering, deep learning, neural networks, pairwise constraints 1. Introduction Clustering is an important unsupervised learning tool often used to analyze the structure of complex high-dimensional data. Semi-supervised clustering (SSC) methods tackle this issue by leveraging partial prior information about class labels, with the goal of obtaining partitions that are better aligned with true classes [1, 2, 3, 4, 5, 6]. One typical way of injecting class label information into clustering is in the form of pairwise constraints (typically, must-link and cannot-link constraints), or pairwise preferences (e.g., should-link and shouldn't-link), which indicate whether a given pair of points is believed to belong to the same or different classes. Most SSC approaches rely on adapting existing unsupervised clustering methods to handle partial (namely, pairwise) information [7, 8, 4, 5, 6, 9].

### ClusterNet : Semi-Supervised Clustering using Neural Networks

Clustering using neural networks has recently demon- strated promising performance in machine learning and computer vision applications. However, the performance of current approaches is limited either by unsupervised learn- ing or their dependence on large set of labeled data sam- ples. In this paper, we propose ClusterNet that uses pair- wise semantic constraints from very few labeled data sam- ples (< 5% of total data) and exploits the abundant un- labeled data to drive the clustering approach. We define a new loss function that uses pairwise semantic similarity between objects combined with constrained k-means clus- tering to efficiently utilize both labeled and unlabeled data in the same framework. The proposed network uses con- volution autoencoder to learn a latent representation that groups data into k specified clusters, while also learning the cluster centers simultaneously. We evaluate and com- pare the performance of ClusterNet on several datasets and state of the art deep clustering approaches.

### Multi-class Classification without Multi-class Labels

This work presents a new strategy for multi-class classification that requires no class-specific labels, but instead leverages pairwise similarity between examples, which is a weaker form of annotation. The proposed method, meta classification learning, optimizes a binary classifier for pairwise similarity prediction and through this process learns a multi-class classifier as a submodule. We formulate this approach, present a probabilistic graphical model for it, and derive a surprisingly simple loss function that can be used to learn neural network-based models. We then demonstrate that this same framework generalizes to the supervised, unsupervised cross-task, and semi-supervised settings. Our method is evaluated against state of the art in all three learning paradigms and shows a superior or comparable accuracy, providing evidence that learning multi-class classification without multi-class labels is a viable learning option.

### A probabilistic constrained clustering for transfer learning and image category discovery

Neural network-based clustering has recently gained popularity, and in particular a constrained clustering formulation has been proposed to perform transfer learning and image category discovery using deep learning. The core idea is to formulate a clustering objective with pairwise constraints that can be used to train a deep clustering network; therefore the cluster assignments and their underlying feature representations are jointly optimized end-to-end. In this work, we provide a novel clustering formulation to address scalability issues of previous work in terms of optimizing deeper networks and larger amounts of categories. The proposed objective directly minimizes the negative log-likelihood of cluster assignment with respect to the pairwise constraints, has no hyper-parameters, and demonstrates improved scalability and performance on both supervised learning and unsupervised transfer learning.

### Learning Neural Models for End-to-End Clustering

We propose a novel end-to-end neural network architecture that, once trained, directly outputs a probabilistic clustering of a batch of input examples in one pass. It estimates a distribution over the number of clusters $k$, and for each $1 \leq k \leq k_\mathrm{max}$, a distribution over the individual cluster assignment for each data point. The network is trained in advance in a supervised fashion on separate data to learn grouping by any perceptual similarity criterion based on pairwise labels (same/different group). It can then be applied to different data containing different groups. We demonstrate promising performance on high-dimensional data like images (COIL-100) and speech (TIMIT). We call this learning to cluster'' and show its conceptual difference to deep metric learning, semi-supervise clustering and other related approaches while having the advantage of performing learnable clustering fully end-to-end.