steinley
Minimum adjusted Rand index for two clusterings of a given size
Chacón, José E., Rastrojo, Ana I.
The adjusted Rand index is one of the most commonly used similarity measures to compare two clusterings of a given set of objects. Indeed, it is the recommended criterion for external clustering evaluation in the seminal study of Milligan and Cooper (1986). Nevertheless, many other measures for external clustering evaluation were recently surveyed in Meilă (2016). Initially, Rand (1971) considered a similarity index between two clusterings (the Rand index) defined as the proportion of object pairs that are either assigned to the same cluster in both clusterings or to different clusters in both clusterings. However, Morey and Agresti (1984) noted that such an index does not take into account the possible agreement by chance, and Hubert and Arabie (1985) introduced a corrected-for-chance version of the Rand index, which is usually known as the adjusted Rand index (ARI).
Explicit agreement extremes for a $2\times2$ table with given marginals
Given two different clusterings of a data set, many measures ha ve been proposed to quantify their degree of concordance. A recent review of a representa tive number of them can be found in Meil a (2016). These measures are usually categori zed into three classes: those based on inspecting the assignments of data pairs in both clu sterings, those involving some cluster matching between the two clusterings, and those rel ying on information theoretic criteria. This paper concerns the first one of these classes. In fact, some of the most popular and widely used similarity measures, such as the Rand ind ex, the Jaccard index, or the Fowlkes-Mallows index, belong to this class of pair-based s imilarities, but it should be noted that there is a plethora of them, as explored in Albatineh, Niewiadomska-Bugaj and Mihalko (2006), Warrens (2008) or Warrens and van der Hoef (2019).
A close-up comparison of the misclassification error distance and the adjusted Rand index for external clustering evaluation
Indeed, it was the recommended choice in the seminal paper of Milligan and Cooper (1986), where five criteria were examined regarding the task of comparison of hierarchical clustering algorithms across different hierarchy levels. Their recommendation is based on the fact that, for the null case data (i.e., for a synthetic sample with randomly assigned class labels, showing no significant cluster structure), the ARI was the only index that produced a flat response curve across hierarchy levels, with mean values close to zero, hence indicating that the agreement between the randomly assigned labels and the algorithm solution was due to chance. Another popular measure for clustering validation, not included in Milligan and Cooper's study, is the misclassification error distance (MED). Its first appearance in the literature dates back at least to R egnier (1965), where it was introduced as a distance between partitions of a finite set, and it was called transfer distance. It is also referred to as partition distance (Gusfield, 2002) or maximum matching distance (Rossi, 2015).