Deep Constrained Clustering - Algorithms and Advances

arXiv.org Machine Learning

The area of constrained clustering has been extensively explored by researchers and used by practitioners. Constrained clustering formulations exist for popular algorithms such as k-means, mixture models, and spectral clustering but have several limitations. We explore a deep learning formulation of constrained clustering and in particular explore how it can extend the field of constrained clustering. We show that our formulation can not only handle standard together/apart constraints without the well documented negative effects reported but can also model instance level constraints (level-of-difficulty), cluster level constraints (balancing cluster size) and triplet constraints. The first two are new ways for domain experts to enforce guidance whilst the later importantly allows generating ordering constraints from continuous side-information.


Fairness in Recommendation Ranking through Pairwise Comparisons

arXiv.org Artificial Intelligence

Recommender systems are one of the most pervasive applications of machine learning in industry, with many services using them to match users to products or information. As such it is important to ask: what are the possible fairness risks, how can we quantify them, and how should we address them? In this paper we offer a set of novel metrics for evaluating algorithmic fairness concerns in recommender systems. In particular we show how measuring fairness based on pairwise comparisons from randomized experiments provides a tractable means to reason about fairness in rankings from recommender systems. Building on this metric, we offer a new regularizer to encourage improving this metric during model training and thus improve fairness in the resulting rankings. We apply this pairwise regularization to a large-scale, production recommender system and show that we are able to significantly improve the system's pairwise fairness.


One-Sided Unsupervised Domain Mapping

Neural Information Processing Systems

In unsupervised domain mapping, the learner is given two unmatched datasets $A$ and $B$. The goal is to learn a mapping $G_{AB}$ that translates a sample in $A$ to the analog sample in $B$. Recent approaches have shown that when learning simultaneously both $G_{AB}$ and the inverse mapping $G_{BA}$, convincing mappings are obtained. In this work, we present a method of learning $G_{AB}$ without learning $G_{BA}$. This is done by learning a mapping that maintains the distance between a pair of samples. Moreover, good mappings are obtained, even by maintaining the distance between different parts of the same sample before and after mapping. We present experimental results that the new method not only allows for one sided mapping learning, but also leads to preferable numerical results over the existing circularity-based constraint. Our entire code is made publicly available at~\url{https://github.com/sagiebenaim/DistanceGAN}.


Sachan

AAAI Conferences

In this paper, we define the problem of coreference resolution in text as one of clustering with pairwise constraints where human experts are asked to provide pairwise constraints (pairwise judgments of coreferentiality) to guide the clustering process. Positing that these pairwise judgments are easy to obtain from humans given the right context, we show that with significantly lower number of pairwise judgments and feature-engineering effort, we can achieve competitive coreference performance. Further, we describe an active learning strategy that minimizes the overall number of such pairwise judgments needed by asking the most informative questions to human experts at each step of coreference resolution. We evaluate this hypothesis and our algorithms on both entity and event coreference tasks and on two languages.


Fu

AAAI Conferences

This paper presents a novel symmetric graph regularization framework for pairwise constraint propagation. We first decompose the challenging problem of pairwise constraint propagation into a series of two-class label propagation subproblems and then deal with these subproblems by quadratic optimization with symmetric graph regularization. More importantly, we clearly show that pairwise constraint propagation is actually equivalent to solving a Lyapunov matrix equation, which is widely used in Control Theory as a standard continuous-time equation. Different from most previous constraint propagation methods that suffer from severe limitations, our method can directly be applied to multi-class problem and also can effectively exploit both must-link and cannot-link constraints. The propagated constraints are further used to adjust the similarity between data points so that they can be incorporated into subsequent clustering. The proposed method has been tested in clustering tasks on six real-life data sets and then shown to achieve significant improvements with respect to the state of the arts.