A Computational Approach to Improving Fairness in K-means Clustering
Zhou, Guancheng, Xu, Haiping, Xu, Hongkang, Li, Chenyu, Yan, Donghui
–arXiv.org Artificial Intelligence
Clustering is an important problem in data mining. It aims to split the data into groups such that data points in the same group are similar while points in different groups are different under a given similarity metric. Clustering has been successfully applied in many practical applications, such as data grouping in exploratory data analysis, search results categorization, market segmentation etc. Clustering results are often used for further analysis or interpretation. However, directly applying results obtained from usual clustering algorithms may suffer from fairness issues-some cluster may favor data points from one of the subpopulations, i.e., having disproportionally more points. One example of 1 Figure 1: Illustration of the fairness issue in clustering, Points of different color indicate different traits on a sensitive variable, e.g., gender where blue indicates male and red female. Cluster 1 is dominated by females while Cluster 2 by males. Points with an arrow indicate that we might switch its cluster membership assignment to make the clusters less dominated by one subpopulation.
arXiv.org Artificial Intelligence
Jun-3-2025
- Country:
- Asia > Middle East
- Jordan (0.05)
- North America > United States
- Massachusetts > Bristol County > Dartmouth (0.14)
- Asia > Middle East
- Genre:
- Research Report (0.64)
- Industry:
- Health & Medicine > Therapeutic Area (0.94)
- Technology: