AITopics | k-medoid

Collaborating Authors

k-medoid

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

K-Medoids For K-Means Seeding

Neural Information Processing SystemsNov-21-2025, 15:56:28 GMT

We show experimentally that the algorithm CLARANS of Ng and Han (1994) finds better K-medoids solutions than the Voronoi iteration algorithm of Hastie et al. (2001). This finding, along with the similarity between the Voronoi iteration algorithm and Lloyd's K-means algorithm, motivates us to use CLARANS as a K-means initializer. We show that CLARANS outperforms other algorithms on 23/23 datasets with a mean decrease over k-means++ of 30% for initialization mean squared error (MSE) and 3% for final MSE. We introduce algorithmic improvements to CLARANS which improve its complexity and runtime, making it a viable initialization scheme for large datasets.

k-means seeding, k-medoid, name change, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

005413e90d003d13886019607b037f52-Paper-Conference.pdf

Neural Information Processing SystemsOct-11-2025, 00:03:59 GMT

coreset, dataset, fwc, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
North America > United States > California (0.04)
(3 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Information Technology (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.67)
Banking & Finance > Credit (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
(2 more...)

Add feedback

Contextual Phenotyping of Pediatric Sepsis Cohort Using Large Language Models

Nagori, Aditya, Gautam, Ayush, Wiens, Matthew O., Nguyen, Vuong, Mugisha, Nathan Kenya, Kabakyenga, Jerome, Kissoon, Niranjan, Ansermino, John Mark, Kamaleswaran, Rishikesan

arXiv.org Artificial IntelligenceAug-5-2025

Clustering patient subgroups is essential for personalized care and efficient resource use. Traditional clustering methods struggle with high-dimensional, heterogeneous healthcare data and lack contextual understanding. This study evaluates Large Language Model (LLM) based clustering against classical methods using a pediatric sepsis dataset from a low-income country (LIC), containing 2,686 records with 28 numerical and 119 categorical variables. Patient records were serialized into text with and without a clustering objective. Embeddings were generated using quantized LLAMA 3.1 8B, DeepSeek-R1-Distill-Llama-8B with low-rank adaptation(LoRA), and Stella-En-400M-V5 models. K-means clustering was applied to these embeddings. Classical comparisons included K-Medoids clustering on UMAP and FAMD-reduced mixed data. Silhouette scores and statistical tests evaluated cluster quality and distinctiveness. Stella-En-400M-V5 achieved the highest Silhouette Score (0.86). LLAMA 3.1 8B with the clustering objective performed better with higher number of clusters, identifying subgroups with distinct nutritional, clinical, and socioeconomic profiles. LLM-based methods outperformed classical techniques by capturing richer context and prioritizing key features. These results highlight potential of LLMs for contextual phenotyping and informed decision-making in resource-limited settings.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2505.09805

Country:

North America > United States (0.46)
Africa > Uganda (0.29)
North America > Canada > British Columbia (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Pediatrics/Neonatology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Reviews: K-Medoids For K-Means Seeding

Neural Information Processing SystemsOct-8-2024, 05:57:56 GMT

The authors propose to use a particular version of the K-medoids algorithm (clarans - that uses iterative swaps to identify the medoids) for initializing k-means and claim that this improves the final clustering quality. The authors have also tested their claims with multiple datasets, and demonstrated their performance improvements. They have also published code that will be made open after the review process. The paper is easy to read and follow, and the authors have done a good job placing their work in context. I appreciate the fact that the optimizations are presented in a very accessible manner in Section 4. As the authors claim, open source code is an important contribution.

k-means seeding, k-medoid, review, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)

Add feedback

Fair Wasserstein Coresets

Xiong, Zikai, Dalmasso, Niccolò, Potluru, Vamsi K., Balch, Tucker, Veloso, Manuela

arXiv.org Machine LearningNov-9-2023

Recent technological advancements have given rise to the ability of collecting vast amounts of data, that often exceed the capacity of commonly used machine learning algorithms. Approaches such as coresets and synthetic data distillation have emerged as frameworks to generate a smaller, yet representative, set of samples for downstream training. As machine learning is increasingly applied to decision-making processes, it becomes imperative for modelers to consider and address biases in the data concerning subgroups defined by factors like race, gender, or other sensitive attributes. Current approaches focus on creating fair synthetic representative samples by optimizing local properties relative to the original samples. These methods, however, are not guaranteed to positively affect the performance or fairness of downstream learning processes. In this work, we present Fair Wasserstein Coresets (FWC), a novel coreset approach which generates fair synthetic representative samples along with sample-level weights to be used in downstream learning tasks. FWC aims to minimize the Wasserstein distance between the original datasets and the weighted synthetic samples while enforcing (an empirical version of) demographic parity, a prominent criterion for algorithmic fairness, via a linear constraint. We show that FWC can be thought of as a constrained version of Lloyd's algorithm for k-medians or k-means clustering. Our experiments, conducted on both synthetic and real datasets, demonstrate the scalability of our approach and highlight the competitive performance of FWC compared to existing fair clustering approaches, even when attempting to enhance the fairness of the latter through fair pre-processing techniques.

dataset, fairlet, fwc, (14 more...)

arXiv.org Machine Learning

2311.05436

Country:

North America > United States > New York (0.04)
North America > United States > Massachusetts (0.04)
North America > United States > California (0.04)
Europe > Germany (0.04)

Genre: Research Report (0.50)

Industry:

Banking & Finance (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback

A Machine Learning-Based Framework for Clustering Residential Electricity Load Profiles to Enhance Demand Response Programs

Michalakopoulos, Vasilis, Sarmas, Elissaios, Papias, Ioannis, Skaloumpakas, Panagiotis, Marinakis, Vangelis, Doukas, Haris

arXiv.org Artificial IntelligenceOct-31-2023

Load shapes derived from smart meter data are frequently employed to analyze daily energy consumption patterns, particularly in the context of applications like Demand Response (DR). Nevertheless, one of the most important challenges to this endeavor lies in identifying the most suitable consumer clusters with similar consumption behaviors. In this paper, we present a novel machine learning based framework in order to achieve optimal load profiling through a real case study, utilizing data from almost 5000 households in London. Four widely used clustering algorithms are applied specifically K-means, K-medoids, Hierarchical Agglomerative Clustering and Density-based Spatial Clustering. An empirical analysis as well as multiple evaluation metrics are leveraged to assess those algorithms. Following that, we redefine the problem as a probabilistic classification one, with the classifier emulating the behavior of a clustering algorithm,leveraging Explainable AI (xAI) to enhance the interpretability of our solution. According to the clustering algorithm analysis the optimal number of clusters for this case is seven. Despite that, our methodology shows that two of the clusters, almost 10\% of the dataset, exhibit significant internal dissimilarity and thus it splits them even further to create nine clusters in total. The scalability and versatility of our solution makes it an ideal choice for power utility companies aiming to segment their users for creating more targeted Demand Response programs.

algorithm, k-means, optimal number, (12 more...)

arXiv.org Artificial Intelligence

2310.20367

Country:

Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
North America > United States (0.04)
Europe > Russia (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Energy > Power Industry > Utilities (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Time Series Clustering With Random Convolutional Kernels

Marco-Blanco, Jorge, Cuevas, Rubén

arXiv.org Artificial IntelligenceJul-6-2023

Time series data, spanning applications ranging from climatology to finance to healthcare, presents significant challenges in data mining due to its size and complexity. One open issue lies in time series clustering, which is crucial for processing large volumes of unlabeled time series data and unlocking valuable insights. Traditional and modern analysis methods, however, often struggle with these complexities. To address these limitations, we introduce R-Clustering, a novel method that utilizes convolutional architectures with randomly selected parameters. Through extensive evaluations, R-Clustering demonstrates superior performance over existing methods in terms of clustering accuracy, computational efficiency and scalability. Empirical results obtained using the UCR archive demonstrate the effectiveness of our approach across diverse time series datasets. The findings highlight the significance of R-Clustering in various domains and applications, contributing to the advancement of time series data mining.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2305.10457

Country:

North America > United States > Washington > King County > Seattle (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Spain > Galicia > Madrid (0.04)
Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.46)

Industry: Health & Medicine (0.66)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Agglomerative Hierarchical Clustering with Dynamic Time Warping for Household Load Curve Clustering

AlMahamid, Fadi, Grolinger, Katarina

arXiv.org Artificial IntelligenceOct-17-2022

Energy companies often implement various demand response (DR) programs to better match electricity demand and supply by offering the consumers incentives to reduce their demand during critical periods. Classifying clients according to their consumption patterns enables targeting specific groups of consumers for DR. Traditional clustering algorithms use standard distance measurement to find the distance between two points. The results produced by clustering algorithms such as K-means, K-medoids, and Gaussian Mixture Models depend on the clustering parameters or initial clusters. In contrast, our methodology uses a shape-based approach that combines Agglomerative Hierarchical Clustering (AHC) with Dynamic Time Warping (DTW) to classify residential households' daily load curves based on their consumption patterns. While DTW seeks the optimal alignment between two load curves, AHC provides a realistic initial clusters center. In this paper, we compare the results with other clustering algorithms such as K-means, K-medoids, and GMM using different distance measures, and we show that AHC using DTW outperformed other clustering algorithms and needed fewer clusters.

artificial intelligence, clustering, machine learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/CCECE49351.2022.9918481

2210.09523

Country:

North America > Canada > Ontario > Middlesex County > London (0.04)
North America > United States (0.04)
Asia > China (0.04)

Genre:

Research Report (0.64)
Overview (0.46)

Industry: Energy > Power Industry (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Noise-robust Clustering

Adesunkanmi, Rahmat, Kumar, Ratnesh

arXiv.org Machine LearningOct-19-2021

This paper presents noise-robust clustering techniques in unsupervised machine learning. The uncertainty about the noise, consistency, and other ambiguities can become severe obstacles in data analytics. As a result, data quality, cleansing, management, and governance remain critical disciplines when working with Big Data. With this complexity, it is no longer sufficient to treat data deterministically as in a classical setting, and it becomes meaningful to account for noise distribution and its impact on data sample values. Classical clustering methods group data into "similarity classes" depending on their relative distances or similarities in the underlying space. This paper addressed this problem via the extension of classical $K$-means and $K$-medoids clustering over data distributions (rather than the raw data). This involves measuring distances among distributions using two types of measures: the optimal mass transport (also called Wasserstein distance, denoted $W_2$) and a novel distance measure proposed in this paper, the expected value of random variable distance (denoted ED). The presented distribution-based $K$-means and $K$-medoids algorithms cluster the data distributions first and then assign each raw data to the cluster of data's distribution.

k-means, k-means and k-medoid, k-medoid, (16 more...)

arXiv.org Machine Learning

2110.08871

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Colorado (0.04)
North America > United States > Iowa > Story County > Ames (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

A local approach to parameter space reduction for regression and classification tasks

Romor, Francesco, Tezzele, Marco, Rozza, Gianluigi

arXiv.org Machine LearningJul-22-2021

Frequently, the parameter space, chosen for shape design or other applications that involve the definition of a surrogate model, present subdomains where the objective function of interest is highly regular or well behaved. So, it could be approximated more accurately if restricted to those subdomains and studied separately. The drawback of this approach is the possible scarcity of data in some applications, but in those, where a quantity of data, moderately abundant considering the parameter space dimension and the complexity of the objective function, is available, partitioned or local studies are beneficial. In this work we propose a new method called local active subspaces (LAS), which explores the synergies of active subspaces with supervised clustering techniques in order to perform a more efficient dimension reduction in the parameter space for the design of accurate response surfaces. We also developed a procedure to exploit the local active subspace information for classification tasks. Using this technique as a preprocessing step onto the parameter space, or output space in case of vectorial outputs, brings remarkable results for the purpose of surrogate modelling.

active subspace, dimension, subspace, (16 more...)

arXiv.org Machine Learning

2107.10867

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback