Goto

Collaborating Authors

 uniformity


The Mass Agreement Score: A Point-centric Measure of Cluster Size Consistency

Wiredu-Aidoo, Randolph

arXiv.org Machine Learning

In clustering, strong dominance in the size of a particular cluster is often undesirable, motivating a measure of cluster size uniformity that can be used to filter such partitions. A basic requirement of such a measure is stability: partitions that differ only slightly in their point assignments should receive similar uniformity scores. A difficulty arises because cluster labels are not fixed objects; algorithms may produce different numbers of labels even when the underlying point distribution changes very little. Measures defined directly over labels can therefore become unstable under label-count perturbations. I introduce the Mass Agreement Score (MAS), a point-centric metric bounded in [0, 1] that evaluates the consistency of expected cluster size as measured from the perspective of points in each cluster. Its construction yields fragment robustness by design, assigning similar scores to partitions with similar bulk structure while remaining sensitive to genuine redistribution of cluster mass.


Mitigating the Popularity Bias of Graph Collaborative Filtering: A Dimensional Collapse Perspective

Neural Information Processing Systems

Graph Collaborative Filtering (GCF) is widely used in personalized recommendation systems. However, GCF suffers from a fundamental problem where features tend to occupy the embedding space inefficiently (by spanning only a low-dimensional subspace).







Appendix: Combating Representation Learning Disparity with Geometric Harmonization

Neural Information Processing Systems

We provide our source codes to ensure the reproducibility of our experimental results. Below we summarize several critical aspects w.r .tthe The datasets we used are all publicly accessible, which is introduced in Appendix E.1. For long-tailed subsets, we strictly follows previous work [29] on CIFAR-100-L T to avoid the bias attribute to the sampling randomness. On ImageNet-L T and Places-L T, we employ the widely-used data split first introduced in [44]. All the experiments are conducted on NVIDIA GeForce RTX 3090 with Python 3.7 and Pytorch 1.7.