Goto

Collaborating Authors

 balanced clustering


Distributed Balanced Clustering via Mapping Coresets

Neural Information Processing Systems

Large-scale clustering of data points in metric spaces is an important problem in mining big data sets. For many applications, we face explicit or implicit size constraints for each cluster which leads to the problem of clustering under capacity constraints or the mapping coresets'' to tackle this issue. For a wide range of clustering objective functions such as k-center, k-median, and k-means, our techniques give distributed algorithms for balanced clustering that match the best known single machine approximation ratios.


Entropy-Aware Similarity for Balanced Clustering: A Case Study with Melanoma Detection

arXiv.org Artificial Intelligence

Clustering data is an unsupervised learning approach that aims to divide a set of data points into multiple groups. It is a crucial yet demanding subject in machine learning and data mining. Its successful applications span various fields. However, conventional clustering techniques necessitate the consideration of balance significance in specific applications. Therefore, this paper addresses the challenge of imbalanced clustering problems and presents a new method for balanced clustering by utilizing entropy-aware similarity, which can be defined as the degree of balances. We have coined the term, entropy-aware similarity for balanced clustering (EASB), which maximizes balance during clustering by complementary clustering of unbalanced data and incorporating entropy in a novel similarity formula that accounts for both angular differences and distances. The effectiveness of the proposed approach is evaluated on actual melanoma medial data, specifically the International Skin Imaging Collaboration (ISIC) 2019 and 2020 challenge datasets, to demonstrate how it can successfully cluster the data while preserving balance. Lastly, we can confirm that the proposed method exhibited outstanding performance in detecting melanoma, comparing to classical methods.


Distributed Balanced Clustering via Mapping Coresets

Neural Information Processing Systems

Large-scale clustering of data points in metric spaces is an important problem in mining big data sets. For many applications, we face explicit or implicit size constraints for each cluster which leads to the problem of clustering under capacity constraints or the balanced clustering'' problem. Although the balanced clustering problem has been widely studied, developing a theoretically sound distributed algorithm remains an open problem. In the present paper we develop a general framework based on mapping coresets'' to tackle this issue. For a wide range of clustering objective functions such as k-center, k-median, and k-means, our techniques give distributed algorithms for balanced clustering that match the best known single machine approximation ratios.


Balanced Clustering with Least Square Regression

AAAI Conferences

Clustering is a fundamental research topic in data mining. A balanced clustering result is often required in a variety of applications. Many existing clustering algorithms have good clustering performances, yet fail in producing balanced clusters. In this paper, we propose a novel and simple method for clustering, referred to as the Balanced Clustering with Least Square regression (BCLS), to minimize the least square linear regression, with a balance constraint to regularize the clustering model. In BCLS, the linear regression is applied to estimate the class-specific hyperplanes that partition each class of data from others, thus guiding the clustering of the data points into different clusters. A balance constraint is utilized to regularize the clustering, by minimizing which can help produce balanced clusters. In addition, we apply the method of augmented Lagrange multipliers (ALM) to help optimize the objective model. The experiments on seven real-world benchmarks demonstrate that our approach not only produces good clustering performance but also guarantees a balanced clustering result.