In this paper, we define the problem of coreference resolution in text as one of clustering with pairwise constraints where human experts are asked to provide pairwise constraints (pairwise judgments of coreferentiality) to guide the clustering process. Positing that these pairwise judgments are easy to obtain from humans given the right context, we show that with significantly lower number of pairwise judgments and feature-engineering effort, we can achieve competitive coreference performance. Further, we describe an active learning strategy that minimizes the overall number of such pairwise judgments needed by asking the most informative questions to human experts at each step of coreference resolution. We evaluate this hypothesis and our algorithms on both entity and event coreference tasks and on two languages.
This paper presents a novel symmetric graph regularization framework for pairwise constraint propagation. We first decompose the challenging problem of pairwise constraint propagation into a series of two-class label propagation subproblems and then deal with these subproblems by quadratic optimization with symmetric graph regularization. More importantly, we clearly show that pairwise constraint propagation is actually equivalent to solving a Lyapunov matrix equation, which is widely used in Control Theory as a standard continuous-time equation. Different from most previous constraint propagation methods that suffer from severe limitations, our method can directly be applied to multi-class problem and also can effectively exploit both must-link and cannot-link constraints. The propagated constraints are further used to adjust the similarity between data points so that they can be incorporated into subsequent clustering. The proposed method has been tested in clustering tasks on six real-life data sets and then shown to achieve significant improvements with respect to the state of the arts.
Multi-label classification has attracted an increasing amount of attention in recent years. To this end, many algorithms have been developed to classify multi-label data in an effective manner. However, they usually do not consider the pairwise relations indicated by sample labels, which actually play important roles in multi-label classification. Inspired by this, we naturally extend the traditional pairwise constraints to the multi-label scenario via a flexible thresholding scheme. Moreover, to improve the generalization ability of the classifier, we adopt a boosting-like strategy to construct a multi-label ensemble from a group of base classifiers. To achieve these goals, this paper presents a novel multi-label classification framework named Variable Pairwise Constraint projection for Multi-label Ensemble (VPCME). Specifically, we take advantage of the variable pairwise constraint projection to learn a lower-dimensional data representation, which preserves the correlations between samples and labels. Thereafter, the base classifiers are trained in the new data space. For the boosting-like strategy, we employ both the variable pairwise constraints and the bootstrap steps to diversify the base classifiers. Empirical studies have shown the superiority of the proposed method in comparison with other approaches.
Constraint-based clustering algorithms exploit background knowledge to construct clusterings that are aligned with the interests of a particular user. This background knowledge is often obtained by allowing the clustering system to pose pairwise queries to the user: should these two elements be in the same cluster or not? Active clustering methods aim to minimize the number of queries needed to obtain a good clustering by querying the most informative pairs first. Ideally, a user should be able to answer a couple of these queries, inspect the resulting clustering, and repeat these two steps until a satisfactory result is obtained. We present COBRAS, an approach to active clustering with pairwise constraints that is suited for such an interactive clustering process. A core concept in COBRAS is that of a super-instance: a local region in the data in which all instances are assumed to belong to the same cluster. COBRAS constructs such super-instances in a top-down manner to produce high-quality results early on in the clustering process, and keeps refining these super-instances as more pairwise queries are given to get more detailed clusterings later on. We experimentally demonstrate that COBRAS produces good clusterings at fast run times, making it an excellent candidate for the iterative clustering scenario outlined above.